[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

ASF GitHub Bot (JIRA) Thu, 16 Jun 2016 10:52:49 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334300#comment-15334300
 ]


ASF GitHub Bot commented on DRILL-4530:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/519#discussion_r67392462
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
 ---
    @@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs, 
FileSelection selection,
             FileSystemPlugin fsPlugin, String storageEngineName, String 
userName)
             throws IOException {
           // TODO: we only check the first file for directory reading.
    -      if(selection.containsDirectories(fs)){
    -        if(isDirReadable(fs, selection.getFirstPath(fs))){
    +      if(selection.containsDirectories(fs)) {
    +        Path dirMetaPath = new Path(selection.getSelectionRoot(), 
Metadata.METADATA_DIRECTORIES_FILENAME);
    +        if (fs.exists(dirMetaPath)) {
    +          ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, 
dirMetaPath.toString());
    +          if (mDirs.getDirectories().size() > 0) {
    +            FileSelection dirSelection = 
FileSelection.createFromDirectories(mDirs.getDirectories(), selection);
    +            dirSelection.setExpandedPartial();
    +            return new DynamicDrillTable(fsPlugin, storageEngineName, 
userName,
    --- End diff --
    
    make sense.. I will add a comment.  thanks for reviewing. 


> Improve metadata cache performance for queries with single partition 
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4530
>                 URL: https://issues.apache.org/jira/browse/DRILL-4530
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

Reply via email to