heguanhui commented on issue #64149: URL: https://github.com/apache/doris/issues/64149#issuecomment-4648952738
> I can reproduce the code-level concern from the current upstream source inspection. > > `DataTypeQuantileStateSerDe::write_column_to_pb()` currently writes `column.get_data_at(row_num)` into `PValues.bytes_value`, and `read_column_from_pb()` feeds those bytes back through `column.insert_data()`. For `ColumnQuantileState`, `get_data_at()` exposes the in-memory `QuantileState` object bytes, while `insert_data()` reconstructs from `reinterpret_cast<const QuantileState*>`. That is not a stable wire format because `QuantileState` contains owning heap-backed members (`std::shared_ptr<TDigest>` and `std::vector<double>`). > > So this looks like a real bug, not just a unit-test comparison issue. It can appear benign only when the original object is still alive in the same process, but it is invalid for protobuf transport because the encoded bytes contain process-local pointer/control-block state rather than the logical quantile state. The same serde is reachable from protobuf-based paths such as RPC function argument/result conversion and Nereids fold-constant result content, so the impact is not limited to the test helper. > > The proposed fix direction is correct: > > * In `write_column_to_pb()`, set `result.mutable_type()->set_id(PGenericType::QUANTILE_STATE)`. > * Serialize each `QuantileState` with `get_serialized_size()` plus `QuantileState::serialize()`, as the JSONB/Arrow/ORC paths already do. > * In `read_column_from_pb()`, construct a fresh `QuantileState`, call `deserialize(Slice(...))`, and insert the logical value into `ColumnQuantileState`. > * Add/keep BE coverage for empty, single, explicit, and TDigest states if possible, and assert both the protobuf type id and logical percentile values after the round trip. Raw `get_data_at()` equality is not a good success criterion for a type with heap-backed members. > > I did not find an already-open PR for this title/topic at the time of checking, so this issue is actionable for a small BE serde fix. > > Breakwater-GitHub-Analysis-Slot: slot_70857bce599d Hi boss, could you help reviewing this pr [https://github.com/apache/doris/pull/64152](url) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
