[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373801#comment-15373801 ]
ASF GitHub Bot commented on DRILL-4530: --------------------------------------- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70532442 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java --- @@ -211,9 +217,76 @@ public void testNoSupportedError() throws Exception { .go(); } + @Test // DRILL-4530 + public void testDrill4530_1() throws Exception { + // create metadata cache + test(String.format("refresh table metadata dfs_test.`%s/%s`", getDfsTestTmpSchemaLocation(), tableName2)); + checkForMetadataFile(tableName2); + + // run query and check correctness + String query1 = String.format("select dir0, dir1, o_custkey, o_orderdate from dfs_test.`%s/%s` " + + " where dir0=1995 and dir1='Q3'", + getDfsTestTmpSchemaLocation(), tableName2); + int expectedRowCount = 20; + int expectedNumFiles = 2; + + int actualRowCount = testSql(query1); + assertEquals(expectedRowCount, actualRowCount); + String numFilesPattern = "numFiles=" + expectedNumFiles; + String usedMetaPattern = "usedMetadataFile=true"; + String cacheFileRootPattern = String.format("%s/%s/1995/Q3", getDfsTestTmpSchemaLocation(), tableName2); --- End diff -- The verification of cacheFileRootPattern probably need put "cacheFileRoot=" as the prefix. Otherwise, the list of files in GroupScan will always find a match for cacheFileRoot, right? > Improve metadata cache performance for queries with single partition > --------------------------------------------------------------------- > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization > Affects Versions: 1.6.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Fix For: Future > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)