demetribu commented on issue #12905: URL: https://github.com/apache/datafusion/issues/12905#issuecomment-2524060879
I tried to implement this in a naive way by just adding the following code: ``` let target_partitions = self.state().config_options().execution.target_partitions; let physical = DataFrame::new(self.state(), input); let batches: Vec<_> = physical .repartition(Partitioning::RoundRobinBatch(target_partitions))? .collect_partitioned().await?; ``` to https://github.com/apache/datafusion/blob/fc703238b1d7794bd132a7fb6b97cad9ba4c7446/datafusion/core/src/execution/context/mod.rs#L795 But I found that doing Repartition on top of a simple Projection gets overridden by physical optimization: https://github.com/apache/datafusion/blob/fc703238b1d7794bd132a7fb6b97cad9ba4c7446/datafusion/core/src/physical_optimizer/enforce_distribution.rs#L975 PS: When I skip remove_dist_changing_operators, the behavior still looks weird. The number of partitions matches target_partitions, but the data isn’t distributed properly. There’s one partition with all the data and others are empty. @jayzhan211 Would appreciate any ideas. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org