github-actions[bot] commented on code in PR #64133:
URL: https://github.com/apache/doris/pull/64133#discussion_r3473385233


##########
be/src/storage/rowset/beta_rowset_reader.cpp:
##########
@@ -342,13 +342,13 @@ Status BetaRowsetReader::_init_iterator() {
                 }
             }
         }
-        if (_read_options.binlog_lsn_idx != -1) {
-            sequence_loc = _read_options.binlog_lsn_idx;

Review Comment:
   Using TSO as the merge sequence here loses the per-row ordering that 
row-binlog DETAIL/MIN_DELTA depends on. The TSO column is not stored per row: 
`RowBinlogSegmentWriter` writes a placeholder, and 
`SegmentIterator::_update_tso_col_if_needed()` replaces every row in a rowset 
with the same commit TSO. For same-key changes that land in different segments 
of the same rowset, the merge comparator sees equal key and equal TSO, then 
falls back to `data_id()`. These inputs are `LazyInitSegmentIterator`s, which 
do not forward the wrapped `SegmentIterator::data_id()` and therefore all 
return the base value 0. That leaves the heap with no deterministic 
tie-breaker, so `BlockReader` can process a later update/delete before an 
earlier one and produce wrong DETAIL/MIN_DELTA rows. Please keep LSN as a 
secondary/per-row merge key for row-binlog scans, or otherwise provide a proven 
stable ordering for same-key rows within a commit TSO.



##########
be/src/load/memtable/memtable_flush_executor.cpp:
##########
@@ -415,6 +390,17 @@ void FlushToken::_flush_memtable_impl(RowsetWriter* 
flush_writer, MemTable* memt
             // }};
             std::shared_ptr<Block> flush_block;
             RETURN_IF_ERROR(_memtable2block(memtable, shared_memtable, 
flush_block));
+            if (flush_writer->context().write_binlog_opt().enable) {
+                const auto& memtable_lsns = memtable->row_binlog_lsns();

Review Comment:
   This `CHECK(!memtable_lsns.empty())` makes a valid empty-rowset close crash 
for row-binlog tables. `TabletsChannel::close()` closes every writer in a 
touched partition, and `DeltaWriter::close()` explicitly initializes writers 
with no data so tablets that received zero rows still create an empty rowset 
for version continuity. In that path `_memtable2block()` returns a zero-row 
block and the sidecar LSN vector is empty; the rowset writer already treats 
zero-row memtables as OK, but this check fires before `flush_memtable()` can do 
so. Please skip LSN registration when `flush_block->rows() == 0` instead of 
requiring a non-empty sidecar.



##########
gensrc/proto/olap_file.proto:
##########
@@ -540,11 +540,14 @@ message TabletSchemaPB {
     // storage_format; the fallback path lives in TabletSchema::init_from_pb.
     optional TabletStorageFormatPB storage_format = 36;
 
-    // cid of the binlog timestamp column (__DORIS_BINLOG_TIMESTAMP__).
-    optional int32 binlog_timestamp_col_idx = 37 [default = -1];
-
     // cid of the binlog lsn column (__DORIS_BINLOG_LSN__).
-    optional int32 binlog_lsn_col_idx = 38 [default = -1];
+    optional int32 binlog_lsn_col_idx = 37 [default = -1];

Review Comment:
   This renumbers persisted tablet-schema fields in a way that corrupts 
existing row-binlog metadata on upgrade. In the base `TabletSchemaPB`, field 37 
was `binlog_timestamp_col_idx` and field 38 was `binlog_lsn_col_idx`; 
`TabletSchemaCloudPB` had the same timestamp/LSN fields at 38 and 39. After 
this change, new code decodes the old timestamp tag as `binlog_lsn_col_idx` and 
the old LSN tag as `binlog_op_col_idx` via `TabletSchema::init_from_pb()` and 
the cloud conversion helpers. Any tablet meta written before the upgrade will 
therefore load with LSN pointing at the old timestamp column and op pointing at 
the old LSN column, and `RowBinlogSegmentWriter` will build the row-binlog 
prefix from the wrong position. Please keep existing field numbers stable, 
reserve removed fields, and add new field numbers plus an explicit 
fallback/migration from the old timestamp/LSN fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to