BiteTheDDDDt opened a new pull request, #60526:
URL: https://github.com/apache/doris/pull/60526
### What problem does this PR solve?
This pull request introduces a new serialization format for
`DataTypeFixedLengthObject` columns, leveraging streamvbyte encoding for
efficient storage and transmission of large data blocks. The new format is
activated for BE exec version 10 and above, which is now set as the maximum
supported version. Additionally, the `AggregateFunctionCount` and
`AggregateFunctionCountNotNullUnary` functions are marked as trivial, likely
for optimization purposes. Below are the most important changes:
### Serialization and Deserialization Improvements
* Introduced a new serialization/deserialization format for
`DataTypeFixedLengthObject` that uses streamvbyte encoding for large data,
improving efficiency for big data columns. The new logic is gated behind BE
exec version 10 and includes fallback to the previous format for older versions
(`be/src/vec/data_types/data_type_fixed_length_object.cpp`).
[[1]](diffhunk://#diff-7d29ab3e43d23db58f2216e23cc131705067e133fb7ab2da72f2e67c725beb48L36-L41)
[[2]](diffhunk://#diff-7d29ab3e43d23db58f2216e23cc131705067e133fb7ab2da72f2e67c725beb48L56-R130)
[[3]](diffhunk://#diff-7d29ab3e43d23db58f2216e23cc131705067e133fb7ab2da72f2e67c725beb48L84-R144)
[[4]](diffhunk://#diff-7d29ab3e43d23db58f2216e23cc131705067e133fb7ab2da72f2e67c725beb48R156-R172)
* Updated the calculation of uncompressed serialized bytes to account for
the new serialization format and potential streamvbyte compression
(`be/src/vec/data_types/data_type_fixed_length_object.cpp`).
* Added the `streamvbyte` library include to support the new
encoding/decoding logic
(`be/src/vec/data_types/data_type_fixed_length_object.cpp`).
### Version Management
* Increased `BeExecVersionManager::max_be_exec_version` from 8 to 10, with
detailed documentation and warnings about the sensitivity of this field. The
new version enables the updated serialization logic
(`be/src/agent/be_exec_version_manager.cpp`).
* Defined a new constant `USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION = 10`
to clearly mark the threshold for the new serialization format
(`be/src/agent/be_exec_version_manager.h`).
### Aggregate Function Optimization
* Marked `AggregateFunctionCount` and `AggregateFunctionCountNotNullUnary`
as trivial by overriding the `is_trivial()` method to return `true`, which may
allow for performance optimizations in the aggregation engine
(`be/src/vec/aggregate_functions/aggregate_function_count.h`).
[[1]](diffhunk://#diff-a5dbb09237f197bffdcbd3bec4fdd089913ec143d96806618c8eeb4c5dbb8cfeR64-R65)
[[2]](diffhunk://#diff-a5dbb09237f197bffdcbd3bec4fdd089913ec143d96806618c8eeb4c5dbb8cfeR212-R213)
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]