Re: [I] incorrect dynamic filter on partitioned join when file partitioned [datafusion]

via GitHub Fri, 06 Feb 2026 00:14:05 -0800


gabotechs commented on issue #20176:
URL: https://github.com/apache/datafusion/issues/20176#issuecomment-3858719580

> A long term fix is to introduce a new type of partitioning for the file
partitioning to safely distinguish the two. Something like KeyPartitoned or
ValuePartitioned is suiting.

I think the problem goes beyond that. Even if the two sides of a join are
`Partitioning::Hash` because there was a `RepartitionExec` before, there is no
guarantee that the partitioning strategy was the same in both. For example:
- What if both sides of the join where manually repartitioned by the user
with a custom rule, and the random seed used to build hashes is different?
- What if in the future we want a new algorithm for RepartitionExec that is
capable of adaptively increase or decrease the output partitions? still both
sides need to match.

Following the same rule, for the same reason we introduce `KeyPartitoned` or
similar, we could argue that more partitioning modes would need to be added,
when all these partitioning methods match the current definition of "Hash
Partitioned" (a bit of an unfortunate name).

> Oracle calls this
[ListPartitioning](https://docs.oracle.com/en/database/oracle/oracle-database/26/cncpt/partitions-views-and-other-schema-objects.html)

Note that this is referring to how data is laid out physically in a
persistent storage. While the document you shared describes how to partition
data storage, the problem here is how to partition read compute resources.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] incorrect dynamic filter on partitioned join when file partitioned [datafusion]

Reply via email to