zeroshade opened a new pull request, #747:
URL: https://github.com/apache/arrow-go/pull/747

   ## Summary
   
   Fixes #691
   
   Adds `Validate()` and `ValidateFull()` methods to `Binary`, `LargeBinary`, 
`String`, and `LargeString` array types, plus top-level dispatch functions and 
record-level convenience helpers.
   
   ## Problem
   
   The existing `setData` validation only checks the **last** offset against 
the data buffer length. Subtly corrupted data — e.g. non-monotonic or negative 
intermediate offsets — passes construction but causes a runtime `panic: slice 
bounds out of range` when `Value(i)` is called later, **after** the IPC 
reader's `recover()` scope has already returned. Users receiving data from 
untrusted sources (e.g. Flight SQL from Doris DB) have no way to detect this 
without crashing.
   
   ## Solution
   
   - `Validate()` — O(1): checks offset buffer size and that the last offset is 
within the data buffer (mirrors existing `setData` checks, but returns an error 
instead of panicking)
   - `ValidateFull()` — O(n): additionally verifies all offsets are 
non-negative and monotonically non-decreasing, catching the subtle corruption 
case
   - `Validate(arr arrow.Array) error` / `ValidateFull(arr arrow.Array) error` 
— top-level dispatch via the new `Validator` interface
   - `ValidateRecord(rec arrow.RecordBatch) error` / `ValidateRecordFull(...)` 
— convenience wrappers that validate all columns, with error messages including 
column index and name
   
   ## Usage
   
   ```go
   rec, err := reader.Read()
   if err != nil { ... }
   if err := array.ValidateRecordFull(rec); err != nil {
       log.Printf("skipping corrupted batch: %v", err)
       rec.Release()
       continue
   }
   ```
   
   ## Test plan
   
   - [ ] `TestBinaryValidate` — valid arrays, sliced arrays, non-monotonic 
offsets, negative first offset
   - [ ] `TestLargeBinaryValidate` — same for large binary
   - [ ] `TestStringValidate` — same for string
   - [ ] `TestLargeStringValidate` — same for large string
   - [ ] `TestTopLevelValidate` — dispatch to `Validator`, passthrough for 
non-`Validator` types, `ValidateRecord` with mixed valid/corrupt columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to