[
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880231#comment-15880231
]
Fabian Hueske commented on FLINK-5859:
--------------------------------------
Hi [~ykt836],
My main motivation to treat partition pruning as filter push-down is to keep
the complexity of the optimizer as small as possible.
You are right, the effort to determine whether a filter can be applied or not
depends on the format of the source. However, I don't think that this
necessarily means that partition pruning must be handled as a special case. In
the end it depends on the TableSource how it determines which predicates apply
and which don't. A partitionable table source would not need to scan all
metadata.
I see your point about the effort and complexity to implement a partitionable
TableSource.
What do you think of the following approach?
We implement a {{PartitionableTableSource}} as an abstract class that
implements the {{FilterableTableSource}} interface.
{{PartitionableTableSource}} would have abstract methods to list the
partitioned fields (and maybe some more). Based on that information
{{PartitionableTableSource}} implements
{{FilterableTableSource.setPredicate()}} and
{{FilterableTableSource.getPredicate()}}, i.e., the
{{PartitionableTableSource}} automatically extracts the right filter
expressions and returns everything it cannot deal with based on the provided
partitioned fields.
TableSources which just support filter push-down by partition pruning implement
{{PartitionableTableSource}} and only have to specify the partition columns and
not have to deal with {{setPredicate()}}.
This solution would keep all partition pruning related logic out of the
optimizer and table schemas.
What you think?
> support partition pruning on Table API & SQL
> --------------------------------------------
>
> Key: FLINK-5859
> URL: https://issues.apache.org/jira/browse/FLINK-5859
> Project: Flink
> Issue Type: New Feature
> Components: Table API & SQL
> Reporter: godfrey he
> Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many
> queries just need to read a small subset of the total data. We can use
> partition information to prune or skip over files irrelevant to the user’s
> queries. Both query optimization time and execution time can be reduced
> obviously, especially for a large partitioned table.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)