[
https://issues.apache.org/jira/browse/DRILL-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204740#comment-15204740
]
Veera Naranammalpuram commented on DRILL-2517:
----------------------------------------------
RE: The log shows that the time for reading parquet meta data from footer files
is significantly reduced (from 7388ms to 102ms) , due the the pruning effect.
[~jni] This above comment talks about reading parquet footers, will this
enhancement apply when Metadata cache files are present as well? In other
words, will the planner not read metadata cache files that belong to a
directory that will not be accessed by the query?
> Apply Partition pruning before reading files during planning
> ------------------------------------------------------------
>
> Key: DRILL-2517
> URL: https://issues.apache.org/jira/browse/DRILL-2517
> Project: Apache Drill
> Issue Type: New Feature
> Components: Query Planning & Optimization
> Affects Versions: 0.7.0, 0.8.0
> Reporter: Adam Gilmore
> Assignee: Jinfeng Ni
> Fix For: 1.6.0, Future
>
>
> Partition pruning still tries to read Parquet files during the planning stage
> even though they don't match the partition filter.
> For example, if there were an invalid Parquet file in a directory that should
> not be queried:
> {code}
> 0: jdbc:drill:zk=local> select sum(price) from dfs.tmp.purchases where dir0 =
> 1;
> Query failed: IllegalArgumentException: file:/tmp/purchases/4/0_0_0.parquet
> is not a Parquet file (too small)
> {code}
> The reason is that the partition pruning happens after the Parquet plugin
> tries to read the footer of each file.
> Ideally, partition pruning would happen first before the format plugin gets
> involved.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)