demetribu commented on issue #12905:
URL: https://github.com/apache/datafusion/issues/12905#issuecomment-2524060879

   I tried to implement this in a naive way by just adding the following code:
   
   ```
   let target_partitions = 
self.state().config_options().execution.target_partitions;
   
   let physical = DataFrame::new(self.state(), input);
   let batches: Vec<_> = physical
       .repartition(Partitioning::RoundRobinBatch(target_partitions))?
       .collect_partitioned().await?;
   ```
   
   to 
https://github.com/apache/datafusion/blob/fc703238b1d7794bd132a7fb6b97cad9ba4c7446/datafusion/core/src/execution/context/mod.rs#L795
   
   But I found that doing Repartition on top of a simple Projection gets 
overridden by physical optimization: 
https://github.com/apache/datafusion/blob/fc703238b1d7794bd132a7fb6b97cad9ba4c7446/datafusion/core/src/physical_optimizer/enforce_distribution.rs#L975
   
   PS: When I skip remove_dist_changing_operators, the behavior still looks 
weird. The number of partitions matches target_partitions, but the data isn’t 
distributed properly. There’s one partition with all the data and others are 
empty.
   
   
   @jayzhan211 Would appreciate any ideas. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to