Copilot commented on code in PR #61996:
URL: https://github.com/apache/doris/pull/61996#discussion_r3026622018


##########
be/src/core/data_type_serde/data_type_datev2_serde.cpp:
##########
@@ -515,4 +517,43 @@ template Status 
DataTypeDateV2SerDe::from_decimal_strict_mode_batch<DataTypeDeci
 template Status 
DataTypeDateV2SerDe::from_decimal_strict_mode_batch<DataTypeDecimal256>(
         const DataTypeDecimal256::ColumnType& decimal_col, IColumn& 
target_col) const;
 
+Status DataTypeDateV2SerDe::write_column_to_pb(const IColumn& column, PValues& 
result,
+                                               int64_t start, int64_t end) 
const {
+    auto row_count = cast_set<int>(end - start);
+    auto* ptype = result.mutable_type();
+    const auto* col = check_and_get_column<ColumnType>(column);
+    auto& data = col->get_data();
+    ptype->set_id(PGenericType::UINT32);
+    auto* values = result.mutable_uint32_value();
+    values->Reserve(row_count);
+    values->Add((uint32_t*)data.begin() + start, (uint32_t*)data.begin() + 
end);

Review Comment:
   write_column_to_pb() is adding values by casting the column’s DateV2Value 
buffer to `uint32_t*` and passing it to protobuf. Even though DateV2Value 
stores an underlying integer, reinterpreting the object array as a `uint32_t` 
array relies on object layout and can violate strict-aliasing rules (undefined 
behavior under optimization). Prefer filling the repeated field by iterating 
and pushing `to_date_int_val()` (or an equivalent accessor) per element, or 
using a safe bit-cast/memcpy-based conversion instead of pointer reinterprets.
   ```suggestion
       for (int64_t i = start; i < end; ++i) {
           values->Add(data[i].to_date_int_val());
       }
   ```



##########
be/src/core/data_type_serde/data_type_datetimev2_serde.cpp:
##########
@@ -597,4 +599,47 @@ template Status 
DataTypeDateTimeV2SerDe::from_decimal_strict_mode_batch<DataType
         const DataTypeDecimal128::ColumnType& decimal_col, IColumn& 
target_col) const;
 template Status 
DataTypeDateTimeV2SerDe::from_decimal_strict_mode_batch<DataTypeDecimal256>(
         const DataTypeDecimal256::ColumnType& decimal_col, IColumn& 
target_col) const;
+
+Status DataTypeDateTimeV2SerDe::write_column_to_pb(const IColumn& column, 
PValues& result,
+                                                   int64_t start, int64_t end) 
const {
+    auto row_count = cast_set<int>(end - start);
+    auto* ptype = result.mutable_type();
+    const auto* col = check_and_get_column<ColumnType>(column);
+    auto& data = col->get_data();
+    ptype->set_id(PGenericType::UINT64);
+    auto* values = result.mutable_uint64_value();
+    values->Reserve(row_count);
+    values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() + 
end);

Review Comment:
   write_column_to_pb() casts the `DateV2Value<DateTimeV2ValueType>` data 
buffer to `uint64_t*` and passes it to protobuf. This relies on the in-memory 
layout of DateV2Value matching a raw `uint64_t` array and can violate 
strict-aliasing rules (undefined behavior). Safer approach: append values by 
iterating and using `to_date_int_val()` (or a safe bit-cast/memcpy) to obtain 
the underlying `uint64_t` representation.
   ```suggestion
       for (int64_t i = start; i < end; ++i) {
           values->Add(data[i].to_date_int_val());
       }
   ```



##########
be/src/core/data_type_serde/data_type_timestamptz_serde.cpp:
##########
@@ -250,4 +253,97 @@ std::string DataTypeTimeStampTzSerDe::to_olap_string(const 
Field& field) const {
     return CastToString::from_timestamptz(field.get<TYPE_TIMESTAMPTZ>(), 6);
 }
 
+Status DataTypeTimeStampTzSerDe::write_column_to_pb(const IColumn& column, 
PValues& result,
+                                                    int64_t start, int64_t 
end) const {
+    auto row_count = cast_set<int>(end - start);
+    auto* ptype = result.mutable_type();
+    const auto* col = check_and_get_column<ColumnType>(column);
+    auto& data = col->get_data();
+    ptype->set_id(PGenericType::UINT64);
+    auto* values = result.mutable_uint64_value();
+    values->Reserve(row_count);
+    values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() + 
end);

Review Comment:
   write_column_to_pb() populates the protobuf repeated field by reinterpreting 
the `TimestampTzValue` object buffer as a `uint64_t*`. This assumes object 
layout compatibility and can trigger strict-aliasing UB under optimization. 
Consider appending the underlying integer via 
`TimestampTzValue::to_date_int_val()` (or a safe bit-cast/memcpy) per row 
instead of pointer casting.
   ```suggestion
       for (int64_t i = start; i < end; ++i) {
           values->Add(binary_cast<TimestampTzValue, UInt64>(data[i]));
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to