[ 
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881999#comment-15881999
 ] 

godfrey he edited comment on FLINK-5859 at 2/24/17 5:59 AM:
------------------------------------------------------------

Hi, [~fhueske], Thanks for your advice.
yes, partition pruning is a kind of coarse-grained filter push-down, both 
filter-pushdown and partition-pruning have common part that is extracting 
predicate from filter-condition base on the interest of different datasources. 
But, filter-pushdown and partition-pruning are independent concept in general. 
The following table shows that different datasources have different traits:

||Trait||Example||
|filter-pushdown only|MySQL, HBase|
|partiton-pruning only|CSV, TEXT|
|both filter-pushdown and partition-pruning| Parquet, Druid|

IMO, we should provide a clear concept as [~ykt836] mentioned above for 
developers, that includes both FilterableTableSource and 
PartitionableTableSource.

Looking forward to your advice, thanks.



was (Author: godfreyhe):
Hi, [~fhueske], Thanks for your advice.
yes, partition pruning is a kind of coarse-grained filter push-down, both 
filter-pushdown and partition-pruning have common parts that are extracting 
predicate from filter-condition base on the interest of different datasources. 
But, IMO, filter-pushdown and partition-pruning are independent concept in 
general. 
The following table shows that different datasources have different traits:

||Trait||Example||
|filter-pushdown only|MySQL, HBase|
|partiton-pruning only|CSV, TEXT|
|both filter-pushdown and partition-pruning| Parquet, Druid|

IMO, we should provide a clear concept as [~ykt836] mentioned above for 
developers, that includes both FilterableTableSource and 
PartitionableTableSource.

Looking forward to your advice, thanks.


> support partition pruning on Table API & SQL
> --------------------------------------------
>
>                 Key: FLINK-5859
>                 URL: https://issues.apache.org/jira/browse/FLINK-5859
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many 
> queries just need to read a small subset of the total data. We can use 
> partition information to prune or skip over files irrelevant to the user’s 
> queries. Both query optimization time and execution time can be reduced 
> obviously, especially for a large partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to