[
https://issues.apache.org/jira/browse/DRILL-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinfeng Ni reassigned DRILL-4250:
---------------------------------
Assignee: Jinfeng Ni
> File system directory-based partition pruning does not work when a directory
> contains both subdirectories and files.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-4250
> URL: https://issues.apache.org/jira/browse/DRILL-4250
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> When a directory contains both subdirectories and files, then the
> directory-based partition pruning would not work.
> For example, I have the following directory structure with nation.parquet
> (copied from tpch sample dataset).
> .//2001/Q1/nation.parquet
> .//2001/Q2/nation.parquet
> The following query has the directory-based partition pruning work correctly.
>
> {code}
> explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1
> = 'Q1';
> 00-00 Screen
> 00-01 Project(*=[$0])
> 00-02 Project(*=[$0])
> 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet]],
> selectionRoot=file:/tmp/fileAndDir, numFiles=1, usedMetadataFile=false,
> columns=[`*`]]])
> {code}
> However, if I add a nation.parquet file to 2001 directory, like the following:
> .//2001/nation.parquet
> .//2001/Q1/nation.parquet
> .//2001/Q2/nation.parquet
> Then, the same query will not have the partition pruning applied.
> {code}
> explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1
> = 'Q1';
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(*=[$0])
> 00-02 Project(T0¦¦*=[$0])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[AND(=($1, 2001), =($2, 'Q1'))])
> 00-05 Project(T0¦¦*=[$0], dir0=[$1], dir1=[$2])
> 00-06 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/nation.parquet],
> ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet],
> ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q2/nation.parquet]],
> selectionRoot=file:/tmp/fileAndDir, numFiles=3, usedMetadataFile=false,
> columns=[`*`]]])
> {code}
> I should note that for the second case where partition pruning did not work,
> the query did return the correct result. Therefore, this issue is only impact
> the query performance, not the query result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)