[ 
https://issues.apache.org/jira/browse/DRILL-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4250.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Fixed in commit: b9bc35a89208d2dd03f1ed751f71a0cd23651c9a

> File system directory-based partition pruning does not work when a directory 
> contains both subdirectories and files.  
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4250
>                 URL: https://issues.apache.org/jira/browse/DRILL-4250
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.5.0
>
>
> When a directory contains both subdirectories and files, then the 
> directory-based partition pruning would not work. 
> For example, I have the following directory structure with nation.parquet 
> (copied from tpch sample dataset).
> .//2001/Q1/nation.parquet
> .//2001/Q2/nation.parquet
> The following query has the directory-based partition pruning work correctly. 
>  
> {code}
> explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1 
> = 'Q1';
> 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(*=[$0])
> 00-03          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet]], 
> selectionRoot=file:/tmp/fileAndDir, numFiles=1, usedMetadataFile=false, 
> columns=[`*`]]])
> {code}
> However, if I add a nation.parquet file to 2001 directory, like the following:
> .//2001/nation.parquet
> .//2001/Q1/nation.parquet
> .//2001/Q2/nation.parquet
> Then, the same query will not have the partition pruning applied.
> {code}
> explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1 
> = 'Q1';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T0¦¦*=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[AND(=($1, 2001), =($2, 'Q1'))])
> 00-05              Project(T0¦¦*=[$0], dir0=[$1], dir1=[$2])
> 00-06                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/nation.parquet], 
> ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet], 
> ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q2/nation.parquet]], 
> selectionRoot=file:/tmp/fileAndDir, numFiles=3, usedMetadataFile=false, 
> columns=[`*`]]])
> {code}
> I should note that for the second case where partition pruning did not work, 
> the query did return the correct result. Therefore, this issue is only impact 
> the query performance, not the query result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to