[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

ASF GitHub Bot (JIRA) Tue, 12 Jul 2016 15:16:17 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373801#comment-15373801
 ]


ASF GitHub Bot commented on DRILL-4530:
---------------------------------------

Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/519#discussion_r70532442
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
    @@ -211,9 +217,76 @@ public void testNoSupportedError() throws Exception {
             .go();
       }
     
    +  @Test // DRILL-4530
    +  public void testDrill4530_1() throws Exception {
    +    // create metadata cache
    +    test(String.format("refresh table metadata dfs_test.`%s/%s`", 
getDfsTestTmpSchemaLocation(), tableName2));
    +    checkForMetadataFile(tableName2);
    +
    +    // run query and check correctness
    +    String query1 = String.format("select dir0, dir1, o_custkey, 
o_orderdate from dfs_test.`%s/%s` " +
    +            " where dir0=1995 and dir1='Q3'",
    +        getDfsTestTmpSchemaLocation(), tableName2);
    +    int expectedRowCount = 20;
    +    int expectedNumFiles = 2;
    +
    +    int actualRowCount = testSql(query1);
    +    assertEquals(expectedRowCount, actualRowCount);
    +    String numFilesPattern = "numFiles=" + expectedNumFiles;
    +    String usedMetaPattern = "usedMetadataFile=true";
    +    String cacheFileRootPattern = String.format("%s/%s/1995/Q3", 
getDfsTestTmpSchemaLocation(), tableName2);
    --- End diff --
    
    The verification of cacheFileRootPattern probably need put "cacheFileRoot=" 
as the prefix. Otherwise,  the list of files in GroupScan will always find a 
match for cacheFileRoot, right?



> Improve metadata cache performance for queries with single partition 
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4530
>                 URL: https://issues.apache.org/jira/browse/DRILL-4530
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: Future
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

Reply via email to