[
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334300#comment-15334300
]
ASF GitHub Bot commented on DRILL-4530:
---------------------------------------
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/519#discussion_r67392462
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
---
@@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs,
FileSelection selection,
FileSystemPlugin fsPlugin, String storageEngineName, String
userName)
throws IOException {
// TODO: we only check the first file for directory reading.
- if(selection.containsDirectories(fs)){
- if(isDirReadable(fs, selection.getFirstPath(fs))){
+ if(selection.containsDirectories(fs)) {
+ Path dirMetaPath = new Path(selection.getSelectionRoot(),
Metadata.METADATA_DIRECTORIES_FILENAME);
+ if (fs.exists(dirMetaPath)) {
+ ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs,
dirMetaPath.toString());
+ if (mDirs.getDirectories().size() > 0) {
+ FileSelection dirSelection =
FileSelection.createFromDirectories(mDirs.getDirectories(), selection);
+ dirSelection.setExpandedPartial();
+ return new DynamicDrillTable(fsPlugin, storageEngineName,
userName,
--- End diff --
make sense.. I will add a comment. thanks for reviewing.
> Improve metadata cache performance for queries with single partition
> ---------------------------------------------------------------------
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
> Issue Type: Improvement
> Components: Query Planning & Optimization
> Affects Versions: 1.6.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching:
> {noformat}
> query 1:
> SELECT col FROM `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2
> elapsed time is 9 sec even though both are accessing the same amount of data.
> The user expectation is that they should perform roughly the same. The main
> difference comes from reading the bigger metadata cache file at the root
> level 'A' for query2 and then applying the partitioning filter. query1 reads
> a much smaller metadata cache file at the subdirectory level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)