tonghuaroot opened a new pull request, #364:
URL: https://github.com/apache/paimon-rust/pull/364
## What
`BinaryRow::from_serialized_bytes` currently validates only the 4-byte arity
prefix and then passes the remaining bytes straight to `from_bytes`. A buffer
that carries a valid arity prefix but a body shorter than the null bitmap
therefore decodes "successfully", and the panic surfaces later when a reader
touches the missing null-bitmap byte:
```rust
pub fn is_null_at(&self, pos: usize) -> bool {
let bit_index = pos + Self::HEADER_SIZE_IN_BYTES as usize;
let byte_index = bit_index / 8;
let bit_offset = bit_index % 8;
(self.data[byte_index] & (1 << bit_offset)) != 0 // index panic on a
short buffer
}
```
`from_serialized_bytes` is used throughout the manifest / stats / partition
read path (e.g. `stats.rs`, `table_scan.rs`, `partition_listing.rs`, the
DataFusion system tables), and the decoded row is typically handed to a
consumer that calls `is_null_at` first — `format_partition_value`, predicate
evaluation, `get_datum` — so a truncated or malformed serialized `BinaryRow`
aborts the reader with a bounds panic instead of a recoverable error.
## Change
- In `from_serialized_bytes`, after reading the arity, reject inputs whose
body is shorter than `cal_fix_part_size_in_bytes(arity)` (null bitmap + fixed
part), returning the crate's existing `Error::UnexpectedError` rather than
constructing a row that will panic on read. A negative arity is also rejected,
and the required size is computed in `i64` so an absurd arity in malformed
input cannot overflow the `i32` size math.
- As a second layer of defense, make `is_null_at` index the null bitmap with
`get`, so a short buffer reports the field as not-null and the typed field
readers (already bounds-checked via `read_slice` / `read_byte_at`) return a
graceful `Err` instead of `is_null_at` panicking.
## Tests
Added regression tests in the `binary_row` test module:
- `test_from_serialized_bytes_truncated_body` — valid arity prefix but empty
/ too-short body → `Err`, no panic.
- `test_from_serialized_bytes_negative_arity` — negative arity → `Err`.
- `test_is_null_at_short_buffer_does_not_panic` — `is_null_at` on a buffer
lacking the null bitmap does not panic; the typed reader then returns `Err`.
- `test_from_serialized_bytes_well_formed_decodes` — negative control: a
correctly sized body still decodes and reads back.
`cargo test -p paimon`, `cargo build -p paimon`, `cargo fmt --all --
--check`, and `cargo clippy -p paimon --all-targets -- -D warnings` all pass
locally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]