gene-bordegaray opened a new issue, #20176:
URL: https://github.com/apache/datafusion/issues/20176
### Describe the bug
When preserving file partitioning we claim "hash" partitioned on the columns
values. Then when applying a dynamic filter for partitioned joins it creates a
filter based on a true hash function, not key/value based like file
partitioning. This causes partitions to be filtered incorrectly:
```text
Declare file partitioned: Hash([partition_col], N)
Filter with hash-based routing:
CASE hash(key) % N
WHEN 0 THEN <partition 0 bounds>
WHEN 1 THEN <partition 1 bounds>
...
Since file partitioned is using value based partitioning and dynamic
filtering is using true hash based end of in this scenario:
┌─────┬──────────────────────┬───────────────┐
│ Key │ Hive Partition Index │ hash(key) % 3 │
├─────┼──────────────────────┼───────────────┤
│ "A" │ 0 │ 2 │
├─────┼──────────────────────┼───────────────┤
│ "B" │ 1 │ 2 │
├─────┼──────────────────────┼───────────────┤
│ "C" │ 2 │ 2 │
└─────┴──────────────────────┴───────────────┘
thus, when the filter evaluates row f_dkey=A, the filter computes it to map
to partition 2 bounds even though it is actually in partition 0. So it doesn't
match and is incorrectly filtered out.
```
### To Reproduce
see PR: #20175
### Expected behavior
For now dynamic filtering will be disabled for partitioned hash joins when
preserve file partitions is on.
A long term fix is to introduce a new type of partitioning for the file
partitioning to safely distinguish the two. Something like `KeyPartitoned` or
`ValuePartitioned` is suiting.
Oracle calls this
[`ListPartitioning`](https://docs.oracle.com/en/database/oracle/oracle-database/26/cncpt/partitions-views-and-other-schema-objects.html)
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]