tonghuaroot opened a new pull request, #364:
URL: https://github.com/apache/paimon-rust/pull/364

   ## What
   
   `BinaryRow::from_serialized_bytes` currently validates only the 4-byte arity 
prefix and then passes the remaining bytes straight to `from_bytes`. A buffer 
that carries a valid arity prefix but a body shorter than the null bitmap 
therefore decodes "successfully", and the panic surfaces later when a reader 
touches the missing null-bitmap byte:
   
   ```rust
   pub fn is_null_at(&self, pos: usize) -> bool {
       let bit_index = pos + Self::HEADER_SIZE_IN_BYTES as usize;
       let byte_index = bit_index / 8;
       let bit_offset = bit_index % 8;
       (self.data[byte_index] & (1 << bit_offset)) != 0   // index panic on a 
short buffer
   }
   ```
   
   `from_serialized_bytes` is used throughout the manifest / stats / partition 
read path (e.g. `stats.rs`, `table_scan.rs`, `partition_listing.rs`, the 
DataFusion system tables), and the decoded row is typically handed to a 
consumer that calls `is_null_at` first — `format_partition_value`, predicate 
evaluation, `get_datum` — so a truncated or malformed serialized `BinaryRow` 
aborts the reader with a bounds panic instead of a recoverable error.
   
   ## Change
   
   - In `from_serialized_bytes`, after reading the arity, reject inputs whose 
body is shorter than `cal_fix_part_size_in_bytes(arity)` (null bitmap + fixed 
part), returning the crate's existing `Error::UnexpectedError` rather than 
constructing a row that will panic on read. A negative arity is also rejected, 
and the required size is computed in `i64` so an absurd arity in malformed 
input cannot overflow the `i32` size math.
   - As a second layer of defense, make `is_null_at` index the null bitmap with 
`get`, so a short buffer reports the field as not-null and the typed field 
readers (already bounds-checked via `read_slice` / `read_byte_at`) return a 
graceful `Err` instead of `is_null_at` panicking.
   
   ## Tests
   
   Added regression tests in the `binary_row` test module:
   - `test_from_serialized_bytes_truncated_body` — valid arity prefix but empty 
/ too-short body → `Err`, no panic.
   - `test_from_serialized_bytes_negative_arity` — negative arity → `Err`.
   - `test_is_null_at_short_buffer_does_not_panic` — `is_null_at` on a buffer 
lacking the null bitmap does not panic; the typed reader then returns `Err`.
   - `test_from_serialized_bytes_well_formed_decodes` — negative control: a 
correctly sized body still decodes and reads back.
   
   `cargo test -p paimon`, `cargo build -p paimon`, `cargo fmt --all -- 
--check`, and `cargo clippy -p paimon --all-targets -- -D warnings` all pass 
locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to