Re: [DISCUSS] add to the specification a clarification of validity buffer for non nullable field of nullable StructArray

serramatutu Fri, 30 Jan 2026 07:00:02 -0800

I agree with the previous comments that definition 3 is probably the best to 
choose moving forward.


I wanted to bring attention to a related (but slightly different) issue: 


Should arrays be "equal" (as returned by functions like `arrow.Equal` and 
`arrow.RecordEqual`) only if they're "binary" (bit for bit) equal or if they 
are "semantic" equal? Should this even be a part of the standard?

I ran into this issue when implementing timestamp with offset for Go. It looks 
like `arrow.RecordFromJSON()` returns a field with `nulls=1` even if the field 
is non-nullable, while my implementation was returning `nulls=0`. When I 
checked for equality of the value roundtripping it via the JSON 
encoder/decoder, and the equality check failed because `original.NullN() != 
roundtripped.NullN()`. 


If we choose the "binary" equal route, than the implementation of 
`arrow.Equal()` is correct in comparing `NullN()` for a non-nullable field, and 
there is a bug in `arrow.RecordFromJSON()` that makes it return `nulls=1` in 
that case. If we choose the "semantic" equal route, then we can consider the 
`nulls=1` as garbage and the implementation of `arrow.Equal()` should be 
relaxed to skip comparing `NullN()` when the field is not nullable.



On Friday, January 30th, 2026 at 15:19, Weston Pace <[email protected]> 
wrote:

> 

> 

> I agree with Raphael that this is probably too late to change. There are
> many tools out there that produce Arrow data now and they are not all going
> to conform to definition 1. In fact, as Antoine points out, many tools do
> not even guarantee validity at all (a batch created with pyarrow may have a
> field marked non-nullable that has nulls).
> 

> As a result, my personal stance has been to ignore the nullability flag on
> all external data and independently determine whether an array has or does
> not have nulls.
> 

> > the problem I have is that this is an undefined behavior, the accepted
> 

> behavior can be (I don't think this should be the behavior) that there
> should be no requirement on the child nulls, and it can have nulls anywhere
> they want even if the parent does not have null there.
> 

> There is very little mention of the nullable flag in the spec at all. The
> only thing I see is:
> 

> > Whether the field is semantically nullable. While this has no bearing on
> 

> the array’s physical layout,
> 

> > many systems distinguish nullable and non-nullable fields and we want to
> 

> allow them to preserve
> 

> > this metadata to enable faithful schema round trips.
> 

> 

> Since the spec explicitly states "this has no bearing on the array's
> physical layout" I think your accepted behavior could, in fact, be seen as
> valid, if not wise.
> 

> That being said, my view might be a little out there :). I am content if
> we want to consolidate on a definition. I think definition 3 is the most
> flexible and likely to be adopted.
> 

> On Thu, Jan 29, 2026 at 11:55 AM Raz Luvaton [email protected] wrote:
> 

> > > If something had been
> > > standardised at the start that would be one thing, but retroactively
> > > adding schema restrictions now is likely to break existing workflows,
> > > and is therefore probably best avoided.
> > 

> > the problem I have is that this is an undefined behavior, the accepted
> > behavior can be (I don't think this should be the behavior) that there
> > should be no requirement on the child nulls, and it can have nulls anywhere
> > they want even if the parent does not have null there.
> > 

> > On 2026/01/29 19:40:01 Raphael Taylor-Davies wrote:
> > 

> > > For what it is worth arrow-rs takes the most permission interpretation 3
> > > - we only reject unambiguously malformed StructArray. For further
> > > context I believe the instigator of this email thread is 1.
> > > 

> > > I think the main question with taking one of the more strict
> > > interpretations is what value is assigned to "masked" values when
> > > parsing from some other format, such as JSON or parquet, that doesn't
> > > encode them. Some people think it should be NULL, others arbitrary. For
> > > example, when arrow-rs changed the parquet reader from using NULL to
> > > arbitrary it was actually reported as a bug 2.
> > > 

> > > My 2 cents is that this is a bit like the question around whether
> > > StructArray can have fields with the same name. If something had been
> > > standardised at the start that would be one thing, but retroactively
> > > adding schema restrictions now is likely to break existing workflows,
> > > and is therefore probably best avoided.
> > > 

> > > Kind Regards,
> > > 

> > > Raphael
> > > 

> > > On 29/01/2026 19:10, Raz Luvaton wrote:
> > > 

> > > > Currently there is ambiguity on what the validity buffer for non
> > > > nullable
> > > > field of a nullable struct can be.
> > > > 

> > > > Lets take for example the following type:
> > > > `nullable StructArray with non nullable field Int32`
> > > > The struct validity is: valid, null, null, valid.
> > > > 

> > > > which of the following should be:
> > > > 1. The child array (the int32 array) FORBIDDEN from having nulls at all
> > > > (i.e. in our example the validity buffer for the child must be valid,
> > > > valid, valid, valid) as the field is marked as non nullable?
> > > > 2. The child array REQUIRED to have nulls at the same positions of the
> > > > struct nulls, i.e. the validity buffer for the child MUST be valid,
> > > > null,
> > > > null, valid in our example?
> > > > 3. The child array MAY have nulls but it is FORBIDDEN to have nulls
> > > > where
> > > > the struct does not have nulls, i.e. it can't have null, null, valid,
> > > > valid
> > > > but it can have valid, null, valid, valid in our example.
> > > > 

> > > > I would argue that 1 is the correct and expected requirement, as the
> > > > field
> > > > is marked as non nullable.
> > > > 

> > > > The chosen behavior will be applicable for other nested types as well
> > > > 

> > > > Thanks, Raz Luvaton

publickey - [email protected] - 0x0A7793AD.asc
Description: application/pgp-keys

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] add to the specification a clarification of validity buffer for non nullable field of nullable StructArray

Reply via email to