zeroshade opened a new pull request, #858:
URL: https://github.com/apache/arrow-go/pull/858

   ### Rationale for this change
   
   `array.RecordBuilder.NewRecordBatch` (and the deprecated `NewRecord`) 
silently
   produced records whose field was declared `Nullable: false` but whose column
   actually contained nulls. The Rust (`RecordBatch::try_new`) and C++
   implementations reject this; arrow-go accepted it, yielding spec-invalid
   records. Fixes #372.
   
   ### What changes are included in this PR?
   
   - **`RecordBuilder`**: `NewRecordBatch` now rejects a record whose 
**top-level**
     field is non-nullable but whose column contains nulls (it panics, 
consistent
     with its existing whole-builder length-mismatch guard). A new
     `RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error)` returns 
the
     violation as an error instead of panicking.
   - The check is **scoped to top-level record columns only**. A non-nullable 
child
     of a *nullable* struct is unaffected (its nulls are legitimate where the 
parent
     slot is null), so existing nested data keeps working. 
`array.NewRecordBatch`
     and the lower-level constructors are unchanged.
   - **Readers**: the CSV, JSON, and Avro readers build through
     `NewRecordBatchChecked` and surface a typed read error instead of building 
an
     invalid record or panicking.
   
   ### Are these changes tested?
   
   Yes. A new `TestRecordBuilderNonNullableNulls` covers rejection (error + 
panic),
   nullable fields with nulls allowed, non-nullable without nulls allowed, a
   nullable struct with a non-nullable child allowed, and the empty-builder 
case.
   Pre-existing fixtures that declared non-nullable fields while appending nulls
   were corrected to mark those fields nullable. The full `arrow/...` and
   `parquet/...` suites pass.
   
   ### Are there any user-facing changes?
   
   Yes:
   
   - New `RecordBuilder.NewRecordBatchChecked() (arrow.RecordBatch, error)`.
   - `RecordBuilder.NewRecordBatch()` / `NewRecord()` now reject (panic) a 
top-level
     non-nullable column that contains nulls; previously such records were built
     silently. This is a behavior change for code that relied on the prior lax
     behavior — the fix is to declare the field `Nullable: true`.
   - The CSV, JSON, and Avro readers now return an error when input would place 
a
     null into a non-nullable top-level column.
   
   **Potential follow-ups (intentionally out of scope here):** recursive 
validation
   of nested non-nullable fields, and logical-null detection for non-nullable
   run-end-encoded columns.
   
   Closes #372
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to