zhaohaidao opened a new issue, #171:
URL: https://github.com/apache/fluss-rust/issues/171

   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/fluss-rust/issues) and found nothing similar.
   
   
   ### Please describe the bug 🐞
   
   ## Summary
   The Rust client uses incorrect slice bounds when reading the 
checksum/schema-id and when computing CRC in `LogRecordBatch`. This can cause 
out-of-bounds panics for minimal header-sized batches and diverges from the 
Java implementation.
   
   ## Impact
   - Potential runtime panic when `self.data` length is only large enough for 
the header.
   - CRC computation uses an incorrect slice end and can panic.
   - Rust behavior diverges from Java client behavior.
   
   ## Root Cause
   In `crates/fluss/src/record/arrow.rs`:
   - `checksum()` and `schema_id()` used `CRC_OFFSET` and `SCHEMA_ID_OFFSET` as 
lengths.
   - `compute_checksum()` used `start + self.data.len()` as the end of the 
slice.
   
   ## Expected Behavior
   - `checksum()` reads exactly 4 bytes (`CRC_LENGTH`) at `CRC_OFFSET`.
   - `schema_id()` reads exactly 2 bytes (`SCHEMA_ID_LENGTH`) at 
`SCHEMA_ID_OFFSET`.
   - `compute_checksum()` computes CRC from `SCHEMA_ID_OFFSET` to the end of 
the batch.
   
   ## Compatibility Note
   - The Rust layout/offsets currently assume V0 only, matching Java's V0 
format. Java also supports V1; Rust needs a future V1-aligned layout to keep 
parity.
   
   ## Evidence
   A minimal-header test fails on old Rust code with `range end index ... out 
of range`, and passes after the fix:
   - `checksum_and_schema_id_read_minimum_header` (Rust unit test)
   
   
   ### Solution
   
   ## Proposed Fix
   Update `LogRecordBatch` in `crates/fluss/src/record/arrow.rs`:
   - `compute_checksum()` => `crc32c(&self.data[SCHEMA_ID_OFFSET..])`
   - `checksum()` => `read_u32(&self.data[CRC_OFFSET..CRC_OFFSET + CRC_LENGTH])`
   - `schema_id()` => `read_i16(&self.data[SCHEMA_ID_OFFSET..SCHEMA_ID_OFFSET + 
SCHEMA_ID_LENGTH])`
   
   ## Tests
   Add a unit test that builds a buffer with only the minimal header and 
verifies:
   - `checksum()` returns the written CRC.
   - `schema_id()` returns the written schema id.
   - `compute_checksum()` matches `crc32c(&data[SCHEMA_ID_OFFSET..])`.
   
   ## Java Implementation Details
   - `DefaultLogRecordBatch.computeChecksum()` wraps 
`MemorySegment.wrap(position, sizeInBytes())` into a `ByteBuffer`, starts at 
`schemaIdOffset(magic)`, uses length `sizeInBytes() - schemaIdOffset(magic)`, 
and calls `Crc32C.compute`.
   - `DefaultLogRecordBatch.checksum()` reads 4-byte CRC via 
`segment.getUnsignedInt(crcOffset(magic) + position)`; `schemaId()` reads 
2-byte schema id via `segment.getShort(schemaIdOffset(magic) + position)`.
   - `ensureValid()` checks `sizeInBytes() >= recordBatchHeaderSize(magic)` 
first, then compares `checksum()` with `computeChecksum()`; `isValid()` follows 
the same logic.
   - `crcOffset(magic)`, `schemaIdOffset(magic)`, and 
`recordBatchHeaderSize(magic)` are derived from `LogRecordBatchFormat` based on 
`magic`, avoiding hard-coded offsets that can go out of bounds.
   
   ## References
   - Java implementation: 
`fluss/fluss-common/src/main/java/org/apache/fluss/record/DefaultLogRecordBatch.java`
   
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to