Copilot commented on code in PR #61996:
URL: https://github.com/apache/doris/pull/61996#discussion_r3026622018
##########
be/src/core/data_type_serde/data_type_datev2_serde.cpp:
##########
@@ -515,4 +517,43 @@ template Status
DataTypeDateV2SerDe::from_decimal_strict_mode_batch<DataTypeDeci
template Status
DataTypeDateV2SerDe::from_decimal_strict_mode_batch<DataTypeDecimal256>(
const DataTypeDecimal256::ColumnType& decimal_col, IColumn&
target_col) const;
+Status DataTypeDateV2SerDe::write_column_to_pb(const IColumn& column, PValues&
result,
+ int64_t start, int64_t end)
const {
+ auto row_count = cast_set<int>(end - start);
+ auto* ptype = result.mutable_type();
+ const auto* col = check_and_get_column<ColumnType>(column);
+ auto& data = col->get_data();
+ ptype->set_id(PGenericType::UINT32);
+ auto* values = result.mutable_uint32_value();
+ values->Reserve(row_count);
+ values->Add((uint32_t*)data.begin() + start, (uint32_t*)data.begin() +
end);
Review Comment:
write_column_to_pb() is adding values by casting the column’s DateV2Value
buffer to `uint32_t*` and passing it to protobuf. Even though DateV2Value
stores an underlying integer, reinterpreting the object array as a `uint32_t`
array relies on object layout and can violate strict-aliasing rules (undefined
behavior under optimization). Prefer filling the repeated field by iterating
and pushing `to_date_int_val()` (or an equivalent accessor) per element, or
using a safe bit-cast/memcpy-based conversion instead of pointer reinterprets.
```suggestion
for (int64_t i = start; i < end; ++i) {
values->Add(data[i].to_date_int_val());
}
```
##########
be/src/core/data_type_serde/data_type_datetimev2_serde.cpp:
##########
@@ -597,4 +599,47 @@ template Status
DataTypeDateTimeV2SerDe::from_decimal_strict_mode_batch<DataType
const DataTypeDecimal128::ColumnType& decimal_col, IColumn&
target_col) const;
template Status
DataTypeDateTimeV2SerDe::from_decimal_strict_mode_batch<DataTypeDecimal256>(
const DataTypeDecimal256::ColumnType& decimal_col, IColumn&
target_col) const;
+
+Status DataTypeDateTimeV2SerDe::write_column_to_pb(const IColumn& column,
PValues& result,
+ int64_t start, int64_t end)
const {
+ auto row_count = cast_set<int>(end - start);
+ auto* ptype = result.mutable_type();
+ const auto* col = check_and_get_column<ColumnType>(column);
+ auto& data = col->get_data();
+ ptype->set_id(PGenericType::UINT64);
+ auto* values = result.mutable_uint64_value();
+ values->Reserve(row_count);
+ values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() +
end);
Review Comment:
write_column_to_pb() casts the `DateV2Value<DateTimeV2ValueType>` data
buffer to `uint64_t*` and passes it to protobuf. This relies on the in-memory
layout of DateV2Value matching a raw `uint64_t` array and can violate
strict-aliasing rules (undefined behavior). Safer approach: append values by
iterating and using `to_date_int_val()` (or a safe bit-cast/memcpy) to obtain
the underlying `uint64_t` representation.
```suggestion
for (int64_t i = start; i < end; ++i) {
values->Add(data[i].to_date_int_val());
}
```
##########
be/src/core/data_type_serde/data_type_timestamptz_serde.cpp:
##########
@@ -250,4 +253,97 @@ std::string DataTypeTimeStampTzSerDe::to_olap_string(const
Field& field) const {
return CastToString::from_timestamptz(field.get<TYPE_TIMESTAMPTZ>(), 6);
}
+Status DataTypeTimeStampTzSerDe::write_column_to_pb(const IColumn& column,
PValues& result,
+ int64_t start, int64_t
end) const {
+ auto row_count = cast_set<int>(end - start);
+ auto* ptype = result.mutable_type();
+ const auto* col = check_and_get_column<ColumnType>(column);
+ auto& data = col->get_data();
+ ptype->set_id(PGenericType::UINT64);
+ auto* values = result.mutable_uint64_value();
+ values->Reserve(row_count);
+ values->Add((uint64_t*)data.begin() + start, (uint64_t*)data.begin() +
end);
Review Comment:
write_column_to_pb() populates the protobuf repeated field by reinterpreting
the `TimestampTzValue` object buffer as a `uint64_t*`. This assumes object
layout compatibility and can trigger strict-aliasing UB under optimization.
Consider appending the underlying integer via
`TimestampTzValue::to_date_int_val()` (or a safe bit-cast/memcpy) per row
instead of pointer casting.
```suggestion
for (int64_t i = start; i < end; ++i) {
values->Add(binary_cast<TimestampTzValue, UInt64>(data[i]));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]