github-actions[bot] commented on code in PR #63434:
URL: https://github.com/apache/doris/pull/63434#discussion_r3272152208


##########
be/src/exec/rowid_fetcher.cpp:
##########
@@ -1072,22 +1077,17 @@ Status RowIdStorageReader::read_doris_format_row(
             return Status::InternalError("Tablet {} does not have row store 
for all columns",
                                          tablet->tablet_id());
         }
-        for (auto row_id : row_ids) {
-            RowLocation loc(rowset_id, segment->id(), 
cast_set<uint32_t>(row_id));
-            row_store_read_struct.row_store_buffer.clear();
-            RETURN_IF_ERROR(scope_timer_run(
-                    [&]() {
-                        return tablet->lookup_row_data({}, loc, rowset, stats,
-                                                       
row_store_read_struct.row_store_buffer);
-                    },
-                    lookup_row_data_ms));
+        auto row_store_rows = ColumnString::create();
+        RETURN_IF_ERROR(scope_timer_run(
+                [&]() {
+                    return tablet->lookup_row_data({}, segment_id, row_ids, 
rowset, stats,
+                                                   *row_store_rows);

Review Comment:
   This batched row-store read can return wrong rows when `row_ids` are not 
monotonically increasing. `read_batch_doris_format_row()` groups only 
contiguous equal `file_id`s from the request; it does not sort by row id. The 
request is built in result-row order 
(`MaterializationSharedState::create_muiltget_result()` appends 
`row_location.row_id` as rows arrive), so a batch for the same segment can be 
in arbitrary order such as `[100, 1]`. `FileColumnIterator::read_by_rowids()` 
seeks to `rowids[total_read_count]` and then advances within the current page, 
so it assumes the remaining row ids are ordered within/after that page; the old 
per-row loop did not have this assumption. Please either preserve the old 
one-by-one path for unordered input, or sort `(row_id, original_index)` for the 
iterator and restore the original output order before `jsonb_to_block()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to