shyjsarah commented on PR #349: URL: https://github.com/apache/paimon-rust/pull/349#issuecomment-4598311750
> Thanks for the fix. The non-partitioned and empty-entry paths are covered now, but one write path can still emit zero-length min/max stats. > > In `crates/paimon/src/table/table_commit.rs`, `compute_partition_stats` still calls `datums_to_binary_row` for the aggregated partition min/max rows. If every value for every partition field is `NULL`, both `mins` and `maxs` remain all `None`, so `datums_to_binary_row` returns `Vec::new()` while `null_counts` is non-empty. Java `SimpleStats.fromRow` still calls `SerializationUtils.deserializeBinaryRow(row.getBinary(0/1))` before it checks for `EMPTY_STATS`, so a partitioned table whose committed partition key is null can hit the same `BufferUnderflowException` this PR is fixing. > > Could you make this path write decodable min/max bytes too, for example a serialized BinaryRow with the partition arity and null bits set, and add a regression test for a partitioned table with an all-null partition key? > > Verification: `cargo test -p paimon` passes locally on this PR. Good catch — pushed b23f870 + dc62a5e: stops routing compute_partition_stats's min/max through datums_to_binary_row (whose all-None → vec![] shortcut is a partitions_to_bytes sentinel, not safe here when null_counts is non-empty), uses a new build_partition_stats_row that always emits an arity-N BinaryRow with null bits set, and adds a regression test for the all-null partition row case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
