[jira] [Resolved] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions

Aman Sinha (JIRA) Wed, 27 Jul 2016 17:27:37 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aman Sinha resolved DRILL-4786.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.8.0

Fixed in commit #: 69a44ed

> Improve metadata cache performance for queries with multiple partitions
> -----------------------------------------------------------------------
>
>                 Key: DRILL-4786
>                 URL: https://issues.apache.org/jira/browse/DRILL-4786
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Metadata, Query Planning & Optimization
>    Affects Versions: 1.7.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: 1.8.0
>
>
> Consider  queries of the following type run against Parquet data with 
> metadata caching:   
> {noformat}
> SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3')
> {noformat}
> For such queries, Drill will read the metadata cache file from the top level 
> directory 'A', which is not very efficient since we are only interested in 
> the files  from some subdirectories of 'B'.   DRILL-4530 improves the 
> performance of such queries when the leaf level directory is a single 
> partition.  Here, there are 3 subpartitions due to the IN list.   We can 
> build upon the DRILL-4530 enhancement by at least reading the cache file from 
> the immediate parent level  `/A/B`  instead of the top level.  
> The goal of this JIRA is to improve performance for such types of queries.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions

Reply via email to