zeroshade opened a new pull request, #858:
URL: https://github.com/apache/arrow-go/pull/858
### Rationale for this change
`array.RecordBuilder.NewRecordBatch` (and the deprecated `NewRecord`)
silently
produced records whose field was declared `Nullable: false` but whose column
actually contained nulls. The Rust (`RecordBatch::try_new`) and C++
implementations reject this; arrow-go accepted it, yielding spec-invalid
records. Fixes #372.
### What changes are included in this PR?
- **`RecordBuilder`**: `NewRecordBatch` now rejects a record whose
**top-level**
field is non-nullable but whose column contains nulls (it panics,
consistent
with its existing whole-builder length-mismatch guard). A new
`RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error)` returns
the
violation as an error instead of panicking.
- The check is **scoped to top-level record columns only**. A non-nullable
child
of a *nullable* struct is unaffected (its nulls are legitimate where the
parent
slot is null), so existing nested data keeps working.
`array.NewRecordBatch`
and the lower-level constructors are unchanged.
- **Readers**: the CSV, JSON, and Avro readers build through
`NewRecordBatchChecked` and surface a typed read error instead of building
an
invalid record or panicking.
### Are these changes tested?
Yes. A new `TestRecordBuilderNonNullableNulls` covers rejection (error +
panic),
nullable fields with nulls allowed, non-nullable without nulls allowed, a
nullable struct with a non-nullable child allowed, and the empty-builder
case.
Pre-existing fixtures that declared non-nullable fields while appending nulls
were corrected to mark those fields nullable. The full `arrow/...` and
`parquet/...` suites pass.
### Are there any user-facing changes?
Yes:
- New `RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error)`.
- `RecordBuilder.NewRecordBatch()` / `NewRecord()` now reject (panic) a
top-level
non-nullable column that contains nulls; previously such records were built
silently. This is a behavior change for code that relied on the prior lax
behavior — the fix is to declare the field `Nullable: true`.
- The CSV, JSON, and Avro readers now return an error when input would place
a
null into a non-nullable top-level column.
**Potential follow-ups (intentionally out of scope here):** recursive
validation
of nested non-nullable fields, and logical-null detection for non-nullable
run-end-encoded columns.
Closes #372
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]