asolimando commented on PR #19124: URL: https://github.com/apache/datafusion/pull/19124#issuecomment-3626407014
> Because we can have fewer file partition groups than target_partitions, forcing a partition group (with possibly large amounts of data) to be read in a single partition can increase file I/O. This configuration choice was made to be able to control the amount of I/O overhead a user is willing to have in order to eliminate shuffles. (This was recommended by @gabotechs and is a great approach to have more granularity over this behavior rather than a boolean flag, thank you) I read this knob as primarily controlling the tradeoff between scan parallelism vs I/O overhead and eliminating shuffles, which is very useful and more expressive than a boolean. However, I think we should also consider the impact of data skew. For heavily skewed tables, preserving file partitioning can make the scan itself significantly unbalanced (one or a few partition groups doing most of the I/O), and in those cases you might actually prefer to pay the shuffle cost rather than constrain execution to the file partition layout. That’s why I think it’s fine to keep a global configuration option, but ideally we would also support passing a different value per scan. In practice you may have: - a skewed table, where you don’t want to match in-memory partitioning to file partitions, and - other tables where you’re perfectly happy preserving file partitioning. If adding per-scan configuration is too much for this PR, it’s probably enough to call it out explicitly as follow-up work, but I think this skew aspect is important to keep in mind for the feature’s overall design. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
