zeroshade commented on code in PR #858:
URL: https://github.com/apache/arrow-go/pull/858#discussion_r3461462272


##########
arrow/array/record.go:
##########
@@ -231,6 +231,28 @@ func (rec *simpleRecord) validate() error {
        return nil
 }
 
+// nullCountInRows returns the number of top-level nulls among the first nrows
+// rows of arr. It does not recurse into child arrays, so a nullable struct 
whose
+// non-nullable children carry nulls in null parent slots is not flagged; only 
a
+// field's own top-level nullability is enforced. Run-end-encoded logical 
nulls,
+// held in the values child rather than a top-level bitmap, are not counted.
+func nullCountInRows(arr arrow.Array, nrows int64) int {

Review Comment:
   Good call — done in 3b128df0. Dropped the per-row scan (and the 
`nullCountInRows` helper); the check is now just `!f.Nullable && 
cols[i].NullN() > 0`. `NullN()` is the cached null count, so since we only need 
to know whether *any* nulls are present (not the exact count) there's no reason 
to walk the column. The error message no longer reports a count.
   



##########
arrow/array/record.go:
##########
@@ -385,11 +407,26 @@ func (b *RecordBuilder) Resize(n int) {
 //
 // The returned RecordBatch must be Release()'d after use.
 //
-// NewRecordBatch panics if the fields' builder do not have the same length.
+// NewRecordBatch panics if the fields' builders do not have the same length 
or if
+// a non-nullable field contains nulls. Use NewRecordBatchChecked for an error.

Review Comment:
   I dug into deprecating it, but it turns out to be disproportionate for this 
PR. `staticcheck`'s SA1019 fires on cross-package use, so deprecating 
`NewRecordBatch` would mean migrating ~76 call sites across ~20 files to 
`NewRecordBatchChecked` (golangci-lint caps output at 3 per message, which hid 
the real count at first). That set also includes the new 
`TestRecordBuilderNonNullableNulls`, which deliberately calls 
`NewRecordBatch()` to assert it panics, so it would need a 
`//nolint:staticcheck,SA1019`.
   
   Since `NewRecordBatch` already panics today on a whole-builder length 
mismatch, rejecting nulls in a non-nullable column reads as a consistent 
extension of that existing contract rather than brand-new panic behavior — and 
`NewRecordBatchChecked` is there for callers who'd rather have an error. So for 
now I've left `NewRecordBatch` as-is with the clarified doc. Happy to do the 
deprecation + full caller migration as a dedicated follow-up if you think the 
stronger signal is worth it; it just felt too sweeping to fold into this fix. 
WDYT?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to