eldenmoon opened a new pull request, #64094:
URL: https://github.com/apache/doris/pull/64094
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Local shuffle computes CRC32C hashes for nullable columns
when routing rows. The optimized nullable hash path normalized null rows by
mutating the nullable column's nested column during a logically const hash
calculation. If that column instance is shared by exchange or join processing,
the mutation can leak to later consumers and cause unstable local shuffle
results or internal errors. This change keeps the no-null fast path, but clones
the nested column before replacing null data when null rows are present, so
hash normalization is isolated to a temporary column. A unit test reproduces
the mutation by asserting nullable hashing preserves nested values.
### Release note
None
### Check List (For Author)
- Test:
- BE UT: `./run-be-ut.sh --run
--filter='ColumnNullableTest.UpdateCrc32cBatchDoesNotMutateNestedColumn'`
- Manual test: local 4-BE local-shuffle constructed query matched the
no-local-shuffle baseline for 5 iterations
- Manual test: provided repro SQL for the nullable local-shuffle
internal-error case passed 100 iterations on a local 4-BE cluster
- Behavior changed: No
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]