github-actions[bot] commented on code in PR #61895:
URL: https://github.com/apache/doris/pull/61895#discussion_r3013147402


##########
be/src/core/column/column_variant.cpp:
##########
@@ -737,12 +747,30 @@ void ColumnVariant::check_consistency() const {
                                "unmatched doc snapshot column:, expeted rows: 
{}, but meet: {}",
                                num_rows, serialized_doc_value_column->size());
     }
-    // const auto& offsets = serialized_doc_value_column_offsets();
-    // size_t off = offsets[num_rows - 1];
-    // if (off > 0 && subcolumns.size() != 1) {
-    //     throw doris::Exception(doris::ErrorCode::INTERNAL_ERROR,
-    //                            "doc snapshot column offsets is not empty, 
but subcolumns size is not 1");
-    // }
+    if (_enable_doc_mode && num_rows > 0) {
+        // doc mode invariants:
+        // - only root subcolumn (size == 1)
+        // - sparse column is empty
+        // - subcolumns and doc_value are mutually exclusive
+        if (subcolumns.size() != 1) {
+            throw doris::Exception(doris::ErrorCode::INTERNAL_ERROR,
+                                   "doc mode: should only have root, but 
subcolumns size is {}",
+                                   subcolumns.size());
+        }
+        const auto& sparse_offs = serialized_sparse_column_offsets();
+        if (sparse_offs[num_rows - 1] > 0) {
+            throw doris::Exception(doris::ErrorCode::INTERNAL_ERROR,
+                                   "doc mode: should not have sparse data");
+        }
+    } else {
+        const auto& offsets = serialized_doc_value_column_offsets();
+        size_t off = offsets[num_rows - 1];

Review Comment:
   **Bug: Out-of-bounds access when `num_rows == 0` and `_enable_doc_mode == 
false`**
   
   The `if` branch correctly guards with `_enable_doc_mode && num_rows > 0`, 
but the `else` branch is entered when `!(_enable_doc_mode && num_rows > 0)`, 
which includes the case `_enable_doc_mode == false && num_rows == 0`. In that 
case, `offsets[num_rows - 1]` becomes `offsets[SIZE_MAX]` (size_t underflow), 
causing undefined behavior.
   
   Fix: add `num_rows > 0` guard to the `else` branch:
   ```cpp
   if (_enable_doc_mode && num_rows > 0) {
       // ... doc mode checks ...
   } else if (num_rows > 0) {
       // ... non-doc-mode checks ...
   }
   ```



##########
be/src/core/column/column_variant.cpp:
##########
@@ -793,17 +821,21 @@ void ColumnVariant::for_each_subcolumn(ColumnCallback 
callback) {
 
 void ColumnVariant::insert_from(const IColumn& src, size_t n) {
     const auto* src_v = check_and_get_column<ColumnVariant>(src);
-    // only root, quick insert
-    if (src_v->get_subcolumns().size() == 1 && get_subcolumns().size() == 1) {
+    ENABLE_CHECK_CONSISTENCY(src_v);
+    ENABLE_CHECK_CONSISTENCY(this);
+    // doc mode fast path: both sides root-only, direct copy root + sparse + 
doc_value
+    if (_enable_doc_mode) {
+        DCHECK(src_v->_enable_doc_mode) << "dst is doc mode but src is not";
         FieldWithDataType field;

Review Comment:
   **Behavior Change: Fast path narrowed from structural check to flag check**
   
   The old condition was `src_v->get_subcolumns().size() == 1 && 
get_subcolumns().size() == 1`, which provided a fast path for *any* root-only 
ColumnVariant (regardless of doc mode). The new condition `_enable_doc_mode` 
restricts this fast path to doc-mode columns only.
   
   For non-doc-mode columns with `subcolumns.size() == 1` (e.g., after 
finalization where subcolumns were demoted to sparse), the code now falls 
through to `try_insert((*src_v)[n])` which:
   1. Materializes a `VariantMap` from `operator[]` (expensive: deserializes 
sparse data)
   2. Re-inserts each key via `add_sub_column` (bypasses 
`_max_subcolumns_count` limits)
   3. Always inserts a default into sparse column (line 878), losing the 
original sparse data layout
   
   This is a semantic change, not just a refactor. Consider preserving the old 
condition as an additional fast path:
   ```cpp
   if (_enable_doc_mode) {
       // doc mode fast path ...
   } else if (src_v->get_subcolumns().size() == 1 && get_subcolumns().size() == 
1) {
       // root-only fast path (copy root + sparse + doc_value directly)
       ...
   } else {
       try_insert((*src_v)[n]);
   }
   ```



##########
be/src/storage/tablet/tablet_meta.cpp:
##########
@@ -561,8 +561,8 @@ void TabletMeta::init_column_from_tcolumn(uint32_t 
unique_id, const TColumn& tco
     if (tcolumn.__isset.variant_sparse_hash_shard_count) {
         
column->set_variant_sparse_hash_shard_count(tcolumn.variant_sparse_hash_shard_count);
     }
-    if (tcolumn.__isset.variant_enable_doc_mode) {
-        column->set_variant_enable_doc_mode(tcolumn.variant_enable_doc_mode);
+    if (tcolumn.column_type.__isset.variant_enable_doc_mode) {
+        
column->set_variant_enable_doc_mode(tcolumn.column_type.variant_enable_doc_mode);
     }

Review Comment:
   **Rolling Upgrade Compatibility: Missing fallback to deprecated TColumn 
field**
   
   This now reads only from `tcolumn.column_type.variant_enable_doc_mode` 
(field 7 on `TColumnType`). During a rolling upgrade, an old FE may only set 
`tcolumn.variant_enable_doc_mode` (field 26 on `TColumn`) without populating 
the new `TColumnType` field. In that scenario, the new BE would silently drop 
`enable_doc_mode = true`, defaulting to `false`.
   
   Suggested fix: add fallback to the deprecated field:
   ```cpp
   if (tcolumn.column_type.__isset.variant_enable_doc_mode) {
       
column->set_variant_enable_doc_mode(tcolumn.column_type.variant_enable_doc_mode);
   } else if (tcolumn.__isset.variant_enable_doc_mode) {
       column->set_variant_enable_doc_mode(tcolumn.variant_enable_doc_mode);
   }
   ```
   This matches the pattern used by other fields in this method that were 
migrated from `TColumn` to `TColumnType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to