github-actions[bot] commented on code in PR #63001:
URL: https://github.com/apache/doris/pull/63001#discussion_r3192769770


##########
be/src/core/data_type/data_type_map.cpp:
##########
@@ -145,8 +145,8 @@ const char* DataTypeMap::deserialize(const char* buf, 
MutableColumnPtr* column,
     memcpy(map_offsets.data(), buf, sizeof(ColumnArray::Offset64) * 
real_have_saved_num);
     buf += sizeof(ColumnArray::Offset64) * real_have_saved_num;
     // key value
-    auto nested_keys_column = map_column->get_keys_ptr()->assume_mutable();
-    auto nested_values_column = map_column->get_values_ptr()->assume_mutable();
+    auto nested_keys_column = std::move(*map_column->get_keys_ptr()).mutate();
+    auto nested_values_column = 
std::move(*map_column->get_values_ptr()).mutate();

Review Comment:
   Same COW ownership issue here: `mutate()` may return cloned key/value 
children when the map subcolumns are shared, but the deserialized 
`nested_keys_column` and `nested_values_column` are dropped instead of being 
written back to `map_column`. That leaves updated offsets with stale 
keys/values. Please assign both mutated children back to `get_keys_ptr()` and 
`get_values_ptr()` after deserializing them.



##########
be/src/core/data_type/data_type_array.cpp:
##########
@@ -126,7 +126,7 @@ const char* DataTypeArray::deserialize(const char* buf, 
MutableColumnPtr* column
     memcpy(offsets.data(), buf, sizeof(ColumnArray::Offset64) * 
real_have_saved_num);
     buf += sizeof(ColumnArray::Offset64) * real_have_saved_num;
     // children
-    auto nested_column = data_column->get_data_ptr()->assume_mutable();
+    auto nested_column = std::move(*data_column->get_data_ptr()).mutate();
     buf = get_nested_type()->deserialize(buf, &nested_column, be_exec_version);

Review Comment:
   This now mutates a local `nested_column`, but if `get_data_ptr()` is shared, 
`mutate()` clones the child and deserialization fills only that clone. Since 
the clone is never assigned back to the `ColumnArray`, the array keeps the old 
child while offsets have already been resized/copied, producing an inconsistent 
deserialized array. Please store `nested_column` back into 
`data_column->get_data_ptr()` after `get_nested_type()->deserialize(...)`.



##########
be/src/core/data_type/data_type_struct.cpp:
##########
@@ -223,7 +223,7 @@ const char* DataTypeStruct::deserialize(const char* buf, 
MutableColumnPtr* colum
     auto* struct_column = assert_cast<ColumnStruct*>(origin_column);
     DCHECK(elems.size() == struct_column->tuple_size());
     for (size_t i = 0; i < elems.size(); ++i) {
-        auto child_column = struct_column->get_column_ptr(i)->assume_mutable();
+        auto child_column = 
std::move(*struct_column->get_column_ptr(i)).mutate();
         buf = elems[i]->deserialize(buf, &child_column, be_exec_version);

Review Comment:
   If a struct child is shared, this `mutate()` clones it; 
`elems[i]->deserialize()` then writes into the clone, but the clone is 
discarded at the end of the loop iteration. The parent `ColumnStruct` continues 
to reference the old child. Please write `child_column` back to 
`struct_column->get_column_ptr(i)` after deserialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to