sunjincheng121 edited a comment on issue #9748: 
[FLINK-14016][python][flink-table-planner] Introduce 
FlinkLogicalPythonScalarFunctionExec and DataStreamPythonScalarFunctionExec for 
Python function execution
URL: https://github.com/apache/flink/pull/9748#issuecomment-534481069
 
 
   Thanks a lot for your contribute @dianfu . I have gone through the PR. My 
main comments are below:
   - For the logical node, I think we don't need to create a new node for the 
python, instead, we can reuse the current `FlinkLogicalCalc` node. In this way, 
not only we can reuse most of the code, but also I think it matches the 
semantic that it is a Calc.
   - For the physical node, we can create a `DataStreamCalcBase` node and let 
both `DataStreamPythonCalc` and `DataStreamCalc` extend it.
   - For the split rule, maybe we can further split the calc a bit, i.e., don't 
mix java and python UDFs rather than further split them in translateToPlan. 
Doing all split in the split rule is more clear.
   - For the filter, we calculate these condition UDFs together with other UDFs 
and do the filter later. I think we can optimize it a bit, i.e., calculate the 
conditions first and then check whether to call the other UDFs. This can be 
easily achieved in the SplitRule.
   
   Last for the new optimization phase, I agree with the current design. 
Another choice is using the current logic optimization phase, but I think it 
would bring some side effects 
   like a) The PythonSplitRule would conflict with the CalcMerge rules and we 
need to do some hack for the merge rules. b) Adding all rules in logical phase 
makes the time of optimization longer and longer. This is the reason that we 
have a lot of Rule based optimization phase in Blink.
   
   What do you think? @dianfu @twalthr :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to