zclllyybb commented on issue #64124:
URL: https://github.com/apache/doris/issues/64124#issuecomment-4623816595

   Breakwater-GitHub-Analysis-Slot: slot_8f7fd7aad58d
   
   Thanks for the detailed report. I checked the current Doris code path and 
this looks like a valid BE-side Iceberg sorted-write bug, not an OSS/catalog 
symptom.
   
   Code path that matches the crash:
   
   - An Iceberg table with `ORDER BY` is routed through `VIcebergSortWriter`.
   - `VIcebergSortWriter::write()` appends the incoming block into 
`FullSorter`, and `_flush_to_file()` runs `do_sort()` -> 
`prepare_for_read(false)` -> `_write_sorted_data()` -> 
`_close_current_writer_and_open_next()` -> `_sorter->reset()` when the target 
file size is reached.
   - `FullSorter::reset()` only delegates to `MergeSorterState::reset()`.
   - `MergeSorterState::reset()` currently clears `_sorted_blocks`, rebuilds 
the unsorted block, and resets `_in_mem_sorted_bocks_size`, but it does not 
reset the merge `_queue` or `_num_rows`.
   - `SortingQueueBatch::update_batch_size()` assumes the queue entries hold 
valid cursors. The reported stack at 
`SortingQueueBatch<MergeSortCursor>::update_batch_size()` is consistent with an 
invalid/stale merge cursor being reused after a sorted-writer flush cycle.
   
   I did not find a public PR that already fixes this, and current upstream 
still appears to have the incomplete reset behavior. The proposed fix direction 
is reasonable:
   
   - reset the merge queue in `MergeSorterState::reset()`;
   - reset `_num_rows` in the same state reset;
   - add a regression test that forces an Iceberg sorted writer to flush at 
least twice in one writer task, preferably by lowering 
`iceberg_write_target_file_size_bytes` rather than requiring TPC-DS scale data.
   
   There is also a secondary issue in `VIcebergSortWriter::write()`: 
`_update_spill_block_batch_row_count(block)` is called after 
`_sorter->append_block(&block)`, while `FullSorter::append_block()` clears the 
source block after copying it. That means the spill batch size calculation can 
see zero rows. Moving the row-size calculation before `append_block()` or 
capturing the input row/byte counts before the append is the right fix for that 
part.
   
   For the PR, please include the exact Doris build/commit used if it differs 
from `635a6e1c302`, the session value of `iceberg_write_target_file_size_bytes` 
if it was customized, and a small reproducer/test with a low target file size. 
Until the fix is merged, the practical workaround is to avoid sorted Iceberg 
writes that require multiple flush cycles in one writer task, or temporarily 
increase the target file size enough to avoid repeated flushes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to