[
https://issues.apache.org/jira/browse/FLINK-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711579#comment-15711579
]
Fabian Hueske commented on FLINK-5185:
--------------------------------------
Hi [~ykt836], you are right, I did not consider the case of pushing down
complete queries to sources.
However, I'm not sure if we need to integrate this with `BatchTableSourceScan`.
It might make more sense to have a special `RelNode` implementation for this
more complex use case.
I would make support for pushing more complex operations (aggregations, joins)
into a {{TableSource}} a separate issue.
To be honest, I have not though much about how this would be realized. I think
the Calcite community should have some experience with this and should be able
to give good advice here.
Regarding the question of "schema authority", I think the schema is should be
governed by the {{RelNode}}, i.e., the optimizer's representation of operators.
In my opinion, it should be the optimizers decision to push operations into
sources. As soon as the optimizer decides for such a rewrite, it must adapt the
schema of all involve operators. I do not see how this decision could be left
to the {{TableSource}} without going through the optimizer and it's rewrite
rules. Of course there needs to be an interaction between {{TableSource}} and
{{RelOptRule}} to identify which operations can be pushed down. Not sure how
the interface for this would look like.
Do you think that the solution I proposed (optional projection / filter
parameters for {{BatchTableSourceScan}}) would suffice for the "simpler" cases
of selection and projection push down (which are applicable to many more
sources)?
> Decouple BatchTableSourceScan with TableSourceTable
> ---------------------------------------------------
>
> Key: FLINK-5185
> URL: https://issues.apache.org/jira/browse/FLINK-5185
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Affects Versions: 1.2.0
> Reporter: Kurt Young
> Assignee: zhangjing
> Priority: Minor
>
> As the components' relationship show in this design doc:
> https://docs.google.com/document/d/1PBnEbOcFHlEF1qGGAUgJvINdEXzzFTIRElgvs4-Tdeo/
> We found it's been annoying for {{BatchTableSourceScan}} directly holding
> {{TableSourceTable}}, and refer to {{TableSource}} further. It's ok if the
> relationship is immutable, but when we want to change the {{TableSource}}
> when applying optimizations, it will cause some conflicts and
> misunderstanding.
> Since there is only one way to change {{TableSource}}, which is creating a
> new {{TableSourceTable}} to hold the new {{TableSource}}, and create a new
> {{BatchTableSourceScan}} pointing to the {{TableSourceTable}} which just
> created. The annoying part is the {{RelOptTable}} comes from the super class
> {{TableScan}} still holds the connection to the original {{TableSourceTable}}
> and {{TableSource}}. It will cause some misunderstanding, which one should
> the {{Scan}} rely to, and what's difference between these tables.
> Besides, {{TableSourceTable}} is not very useful in {{BatchTableSourceScan}},
> the only thing {{Scan}} cares is the {{RowType}} it returns, since this is
> and should be decided by {{TableSource}}. So we can let
> {{BatchTableSourceScan}} directly holding {{TableSource}} instead of holding
> {{TableSourceTable}}.If some original information are needed, find table
> through {{RelOptTable}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)