adriangb opened a new pull request, #19379:
URL: https://github.com/apache/datafusion/pull/19379
## Summary
This PR adds protobuf serialization/deserialization support for `HashExpr`,
enabling distributed query execution to serialize hash expressions used in hash
joins and repartitioning.
### Key Changes
- **SeededRandomState wrapper**: Added a `SeededRandomState` struct that
wraps `ahash::RandomState` while preserving the seeds used to create it. This
is necessary because `RandomState` doesn't expose seeds after creation, but we
need them for serialization.
- **Updated seed constants**: Changed `HASH_JOIN_SEED` and
`REPARTITION_RANDOM_STATE` constants to use `SeededRandomState` instead of raw
`RandomState`.
- **HashExpr enhancements**:
- Changed `HashExpr` to use `SeededRandomState`
- Added getter methods: `on_columns()`, `seeds()`, `description()`
- Exported `HashExpr` and `SeededRandomState` from the joins module
- **Protobuf support**:
- Added `PhysicalHashExprNode` message to `datafusion.proto` with fields
for `on_columns`, seeds (4 `u64` values), and `description`
- Implemented serialization in `to_proto.rs`
- Implemented deserialization in `from_proto.rs`
## Test plan
- [x] Added roundtrip test in `roundtrip_physical_plan.rs` that creates a
`HashExpr`, serializes it, deserializes it, and verifies the result
- [x] All existing hash join tests pass (583 tests)
- [x] All proto roundtrip tests pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]