NickCrews commented on code in PR #43782:
URL: https://github.com/apache/arrow/pull/43782#discussion_r1890294211
##########
cpp/src/arrow/compute/kernels/scalar_cast_nested.cc:
##########
@@ -404,14 +400,20 @@ struct CastStruct {
int out_field_index = 0;
for (int field_index : fields_to_select) {
- const auto& target_type = out->type()->field(out_field_index++)->type();
+ const auto& target_field = out->type()->field(out_field_index++);
+ const auto& target_type = target_field->type();
if (field_index == kFillNullSentinel) {
ARROW_ASSIGN_OR_RAISE(auto nulls,
MakeArrayOfNull(target_type->GetSharedPtr(),
batch.length));
out_array->child_data.push_back(nulls->data());
} else {
const auto& values =
(in_array.child_data[field_index].ToArrayData()->Slice(
in_array.offset, in_array.length));
+ if (!target_field->nullable() && values->null_count > 0) {
Review Comment:
I have never heard of kUnknownNullCount. I'm not sure what we should do
here, I think I need to know a lot more background. Can someone who has
experience with null counts help?
Some questions this brings up:
1. This sounds like we sometimes don't know the null count, I assume because
it is expensive to compute. Would this be a good reason to actually force
computation to happen (behavior A), or should we be conservative, and if we
find kUnkownNullCount, then we assume that there could be nulls and don't let
the cast happen and therefore error (behavior B). Behavior A seems much more
correct to me.
2. I don't think this function should have the responsibility of looking at
the child arrays null counts. Should we add some higher level API to arrays
like nullCountOrChildNullCount() that actually does this messy work??
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]