eldenmoon opened a new pull request, #63718:
URL: https://github.com/apache/doris/pull/63718

   ### What problem does this PR solve?
   
   Issue Number: close DORIS-24846
   
   Related PR: #xxx
   
   Problem Summary: `DataTypeVariantSerDe::write_column_to_arrow` always cast 
the Arrow builder to `arrow::StringBuilder`. During Parquet OUTFILE export, the 
Arrow block converter can switch utf8 columns to `large_utf8` when a batch is 
large, which gives variant serialization an `arrow::LargeStringBuilder` and 
crashes BE on the bad cast.
   
   This patch handles both `arrow::StringBuilder` and 
`arrow::LargeStringBuilder` for VARIANT Arrow serialization and adds a BE UT 
that reproduces the LargeStringBuilder path.
   
   ### Release note
   
   Fix BE crash when exporting VARIANT columns to Parquet OUTFILE with large 
Arrow string batches.
   
   ### Check List (For Author)
   
   - Test: Unit Test
       - `./run-be-ut.sh --run 
--filter='DataTypeSerDeTest.VariantWriteColumnToArrowSupportsLargeString'`
       - `./run-be-ut.sh --run --filter='DataTypeSerDeTest.*'`
       - `PATH=/mnt/disk1/claude-max/ldb_toolchain16/bin:$PATH 
build-support/check-format.sh`
   - Behavior changed: Yes. VARIANT Arrow serialization now supports 
`large_utf8` builders instead of aborting on a bad builder cast.
   - Does this need documentation: No
   
   ### Notes
   
   `build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN --base 
upstream/master` was attempted. It is blocked by existing diagnostics in this 
path, including `core/types.h` unmatched `NOLINTEND` and pre-existing 
modernize/readability findings in `data_type_variant_serde.cpp` / 
`data_type_serde_test.cpp`; the new signed/unsigned warning introduced while 
developing this patch was fixed before the final tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to