github-actions[bot] commented on code in PR #64694:
URL: https://github.com/apache/doris/pull/64694#discussion_r3452969480


##########
be/src/format/arrow/arrow_block_convertor.cpp:
##########
@@ -102,8 +104,16 @@ Status 
FromBlockToRecordBatchConverter::convert(std::shared_ptr<arrow::RecordBat
         if (!arrow_st.ok()) {
             return to_doris_status(arrow_st);
         }
+        if (config::enable_arrow_validate_full) {
+            
RETURN_IF_ERROR(checkArrowStatus(_arrays[_cur_field_idx]->ValidateFull(),
+                                             _block.get_by_position(idx).name,
+                                             
_arrays[_cur_field_idx]->type()->name()));
+        }
     }
     *out = arrow::RecordBatch::Make(_schema, actual_rows, std::move(_arrays));

Review Comment:
   This full-batch validation conflicts with the existing large-string fallback 
above. For string fields the schema is still built as `arrow::utf8()`, but when 
`column->byte_size() >= MAX_ARROW_UTF8` the converter switches only the 
builder/array to `arrow::large_utf8()`. `RecordBatch::ValidateFull()` then 
validates that LargeString array against the original UTF8 field, so enabling 
`enable_arrow_validate_full` can fail on the very fallback this converter uses 
for large string columns. Please update the output schema when the array is 
promoted, or avoid validating a batch whose schema no longer matches the arrays.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to