zeroshade commented on code in PR #858:
URL: https://github.com/apache/arrow-go/pull/858#discussion_r3461462272
##########
arrow/array/record.go:
##########
@@ -231,6 +231,28 @@ func (rec *simpleRecord) validate() error {
return nil
}
+// nullCountInRows returns the number of top-level nulls among the first nrows
+// rows of arr. It does not recurse into child arrays, so a nullable struct
whose
+// non-nullable children carry nulls in null parent slots is not flagged; only
a
+// field's own top-level nullability is enforced. Run-end-encoded logical
nulls,
+// held in the values child rather than a top-level bitmap, are not counted.
+func nullCountInRows(arr arrow.Array, nrows int64) int {
Review Comment:
Good call — done in 3b128df0. Dropped the per-row scan (and the
`nullCountInRows` helper); the check is now just `!f.Nullable &&
cols[i].NullN() > 0`. `NullN()` is the cached null count, so since we only need
to know whether *any* nulls are present (not the exact count) there's no reason
to walk the column. The error message no longer reports a count.
##########
arrow/array/record.go:
##########
@@ -385,11 +407,26 @@ func (b *RecordBuilder) Resize(n int) {
//
// The returned RecordBatch must be Release()'d after use.
//
-// NewRecordBatch panics if the fields' builder do not have the same length.
+// NewRecordBatch panics if the fields' builders do not have the same length
or if
+// a non-nullable field contains nulls. Use NewRecordBatchChecked for an error.
Review Comment:
I dug into deprecating it, but it turns out to be disproportionate for this
PR. `staticcheck`'s SA1019 fires on cross-package use, so deprecating
`NewRecordBatch` would mean migrating ~76 call sites across ~20 files to
`NewRecordBatchChecked` (golangci-lint caps output at 3 per message, which hid
the real count at first). That set also includes the new
`TestRecordBuilderNonNullableNulls`, which deliberately calls
`NewRecordBatch()` to assert it panics, so it would need a
`//nolint:staticcheck,SA1019`.
Since `NewRecordBatch` already panics today on a whole-builder length
mismatch, rejecting nulls in a non-nullable column reads as a consistent
extension of that existing contract rather than brand-new panic behavior — and
`NewRecordBatchChecked` is there for callers who'd rather have an error. So for
now I've left `NewRecordBatch` as-is with the clarified doc. Happy to do the
deprecation + full caller migration as a dedicated follow-up if you think the
stronger signal is worth it; it just felt too sweeping to fold into this fix.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]