mrhhsg opened a new pull request, #64060:
URL: https://github.com/apache/doris/pull/64060
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Arrow block conversion previously switched an oversized
UTF8 column to a Large UTF8 builder when the column byte size reached the Arrow
UTF8 limit. The output schema still uses the original UTF8 field, and Doris
does not support this large UTF8 conversion path yet. Return a clear
InvalidArgument error instead, including the column byte size, the configured
limit, and a suggestion to reduce batch_size. Add a focused BE unit test with a
dummy column whose byte_size reaches the limit, so the limit branch is covered
without allocating a huge string column.
### Release note
Return a clear error when Arrow UTF8 block conversion reaches the supported
column byte-size limit.
### Check List (For Author)
- Test: Unit Test / Manual test
- Ran `./build-support/clang-format.sh`
- Ran `./build-support/check-format.sh`
- Ran `git diff --check origin/master..HEAD`
- Ran `DORIS_HOME=/mnt/disk7/hushenggang/doris ninja -C be/ut_build_ASAN
src/format/CMakeFiles/Format.dir/arrow/arrow_block_convertor.cpp.o`
- Ran `DORIS_HOME=/mnt/disk7/hushenggang/doris ninja -C be/ut_build_ASAN
test/CMakeFiles/doris_be_test.dir/core/data_type_serde/data_type_serde_arrow_test.cpp.o`
- Ran `DORIS_HOME=/mnt/disk7/hushenggang/doris ninja -C be/ut_build_ASAN
test/doris_be_test`
- Ran `./run-be-ut.sh --run
--filter=DataTypeSerDeArrowTest.RejectOversizedUtf8ColumnByteSize -j 8`
- Attempted
`CLANG_TIDY_BINARY=/mnt/disk6/common/ldb_toolchain_taipan/bin/clang-tidy
./build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN --base
origin/master`, but clang-tidy could not analyze due to existing
environment/header issues: unmatched NOLINTEND in `be/src/core/types.h` and
missing `stddef.h` from system/toolchain headers. It also reported pre-existing
complexity/function-size warnings in `create_test_block`.
- Behavior changed: Yes. Oversized UTF8 Arrow block conversion now returns
InvalidArgument instead of trying the unsupported Large UTF8 conversion path.
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]