mingmwang commented on code in PR #5322:
URL: https://github.com/apache/arrow-datafusion/pull/5322#discussion_r1119865212


##########
datafusion/core/src/physical_optimizer/pipeline_fixer.rs:
##########
@@ -182,13 +289,46 @@ fn apply_subrules_and_check_finiteness_requirements(
     physical_optimizer_subrules: &Vec<Box<PipelineFixerSubrule>>,
 ) -> Result<Option<PipelineStatePropagator>> {
     for sub_rule in physical_optimizer_subrules {
-        if let Some(value) = sub_rule(&input).transpose()? {
+        if let Some(value) = sub_rule(input.clone()).transpose()? {
             input = value;
         }
     }
     check_finiteness_requirements(input)
 }

Review Comment:
   Good to know.  In the past, there was some discussion to enhance Ballista to 
support both BATCH/STREAMING execution models:
   
   
https://docs.google.com/document/d/1OdAe078axk4qO0ozUxNqBMD4wKoBhzh9keMuLp_jerE/edit#
   http://www.vldb.org/pvldb/vol11/p746-yin.pdf
   I haven't working on Ballista since last year and there is no progress in 
this area.
   
   And in the latest Flink release, they had implement similar features(Bubble 
execution model, hybird shuffle etc).
   
https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode
   
   I think generally we can follow Flink's approach to make both DataFusion and 
Ballista support BATCH/STREAMING execution models.  In the high level, we can 
have different models(BATCH vs STREAMING), and user can specify the execution 
model. In the physical planing phase, we have `BatchPlanner` and 
`StreamingPlanner`, they can share some common rules, and batch and streaming 
planners can have their own rules. 
   In the `ExecutionPlan` trait, we can have another trait to indicate some 
operators are `Source` operators, the source operators can be `BOUNDED` or 
`UNBOUNDED`.    `BOUNDED` or `UNBOUNDED` should be a property available to
   Source operators only. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to