zeroshade opened a new pull request, #747:
URL: https://github.com/apache/arrow-go/pull/747
## Summary
Fixes #691
Adds `Validate()` and `ValidateFull()` methods to `Binary`, `LargeBinary`,
`String`, and `LargeString` array types, plus top-level dispatch functions and
record-level convenience helpers.
## Problem
The existing `setData` validation only checks the **last** offset against
the data buffer length. Subtly corrupted data — e.g. non-monotonic or negative
intermediate offsets — passes construction but causes a runtime `panic: slice
bounds out of range` when `Value(i)` is called later, **after** the IPC
reader's `recover()` scope has already returned. Users receiving data from
untrusted sources (e.g. Flight SQL from Doris DB) have no way to detect this
without crashing.
## Solution
- `Validate()` — O(1): checks offset buffer size and that the last offset is
within the data buffer (mirrors existing `setData` checks, but returns an error
instead of panicking)
- `ValidateFull()` — O(n): additionally verifies all offsets are
non-negative and monotonically non-decreasing, catching the subtle corruption
case
- `Validate(arr arrow.Array) error` / `ValidateFull(arr arrow.Array) error`
— top-level dispatch via the new `Validator` interface
- `ValidateRecord(rec arrow.RecordBatch) error` / `ValidateRecordFull(...)`
— convenience wrappers that validate all columns, with error messages including
column index and name
## Usage
```go
rec, err := reader.Read()
if err != nil { ... }
if err := array.ValidateRecordFull(rec); err != nil {
log.Printf("skipping corrupted batch: %v", err)
rec.Release()
continue
}
```
## Test plan
- [ ] `TestBinaryValidate` — valid arrays, sliced arrays, non-monotonic
offsets, negative first offset
- [ ] `TestLargeBinaryValidate` — same for large binary
- [ ] `TestStringValidate` — same for string
- [ ] `TestLargeStringValidate` — same for large string
- [ ] `TestTopLevelValidate` — dispatch to `Validator`, passthrough for
non-`Validator` types, `ValidateRecord` with mixed valid/corrupt columns
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]