Fabian Hueske commented on FLINK-5185:

Hi [~ykt836], you are right, I did not consider the case of pushing down 
complete queries to sources. 
However, I'm not sure if we need to integrate this with `BatchTableSourceScan`. 
It might make more sense to have a special `RelNode` implementation for this 
more complex use case.

I would make support for pushing more complex operations (aggregations, joins) 
into a {{TableSource}} a separate issue. 
To be honest, I have not though much about how this would be realized. I think 
the Calcite community should have some experience with this and should be able 
to give good advice here.

Regarding the question of "schema authority", I think the schema is should be 
governed by the {{RelNode}}, i.e., the optimizer's representation of operators. 
In my opinion, it should be the optimizers decision to push operations into 
sources. As soon as the optimizer decides for such a rewrite, it must adapt the 
schema of all involve operators. I do not see how this decision could be left 
to the {{TableSource}} without going through the optimizer and it's rewrite 
rules. Of course there needs to be an interaction between {{TableSource}} and 
{{RelOptRule}} to identify which operations can be pushed down. Not sure how 
the interface for this would look like.

Do you think that the solution I proposed (optional projection / filter 
parameters for {{BatchTableSourceScan}}) would suffice for the "simpler" cases 
of selection and projection push down (which are applicable to many more 

> Decouple BatchTableSourceScan with TableSourceTable
> ---------------------------------------------------
>                 Key: FLINK-5185
>                 URL: https://issues.apache.org/jira/browse/FLINK-5185
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Kurt Young
>            Assignee: zhangjing
>            Priority: Minor
> As the components' relationship show in this design doc:
> https://docs.google.com/document/d/1PBnEbOcFHlEF1qGGAUgJvINdEXzzFTIRElgvs4-Tdeo/
> We found it's been annoying for {{BatchTableSourceScan}} directly holding 
> {{TableSourceTable}}, and refer to {{TableSource}} further. It's ok if the 
> relationship is immutable, but when we want to change the {{TableSource}} 
> when applying optimizations, it will cause some conflicts and 
> misunderstanding. 
> Since there is only one way to change {{TableSource}}, which is creating a 
> new {{TableSourceTable}} to hold the new {{TableSource}}, and create a new 
> {{BatchTableSourceScan}} pointing to the {{TableSourceTable}} which just 
> created. The annoying part is the {{RelOptTable}} comes from the super class 
> {{TableScan}} still holds the connection to the original {{TableSourceTable}} 
> and {{TableSource}}. It will cause some misunderstanding, which one should 
> the {{Scan}} rely to, and what's difference between these tables. 
> Besides, {{TableSourceTable}} is not very useful in {{BatchTableSourceScan}}, 
> the only thing {{Scan}} cares is the {{RowType}} it returns, since this is 
> and should be decided by {{TableSource}}. So we can let 
> {{BatchTableSourceScan}} directly holding {{TableSource}} instead of holding 
> {{TableSourceTable}}.If some original information are needed, find table 
> through {{RelOptTable}}. 

This message was sent by Atlassian JIRA

Reply via email to