Aman Sinha created DRILL-3948:
---------------------------------
Summary: Partitioning columns of a Parquet table should be made
visible to end user
Key: DRILL-3948
URL: https://issues.apache.org/jira/browse/DRILL-3948
Project: Apache Drill
Issue Type: Improvement
Components: Metadata, Query Planning & Optimization
Affects Versions: 1.2.0
Reporter: Aman Sinha
For Parquet files, Drill can do partition pruning for filter conditions on a
column which satisfies the following criteria:
Each parquet file has a single value of that column. The parquet metadata is
examined for the min and max values of that column and if they are the same,
the column is considered a partitioning column.
When CTAS auto-partition is used, the above criteria is enforced, but even
for files created through external methods could satisfy the criteria.
It is difficult for users to know what exactly are the candidate partitioning
columns in the table. We should provide this information in a user friendly
way: for instance:
- special 'show partition columns for <table>' command
- In the Explain plan, show partition columns for the table in Scan node
More options should be discussed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)