palaska commented on issue #12611: URL: https://github.com/apache/datafusion/issues/12611#issuecomment-2379813052
> In datafusion, target_partition argument doesn't necessarily increase partition count each time. If DataFusion thinks that executing the query in single partition is better in terms of performance, it will do so even if target_partition number is larger than 1. Do you think, parallelism will improve the performance for this query, if you think so we should definitely increase partition for this query. What is your thoughts in this regard? > > In short, setting target_partitions to larger than 1 doesn't necessarily increase partition in datafusion. Thanks for the explanation! I agree that optimizing for performance makes sense, as long as it doesn't compromise guarantees or hurt system predictability. In Ballista, a "task" is generated for each partition, and changing this behavior has caused some unit test assertions to fail. However, I don't think this is a major issue for Ballista. I’m not familiar with how this flag is being used in other systems, but @alamb might have some insights to share. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
