On Fri, Feb 18, 2022 at 1:14 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote:
> Isn't field-0 representing ["joe", None, None, "mark"]? validity is > "00001001" and offsets [0,3,3,7]. My reading is that the values buffer is > "joemark" because we do not represent values in null slots. > Hm, maybe I'm misunderstanding something. My current understanding is: Each field's array has a logical length, which must match the logical length of the parent struct. The logical length may or may not be equal to the length of the values buffer. In the example, the logical length is 4 but only the "age" field's values buffer has length 4. The description underneath the example says: > While a struct does not have physical storage for each of its semantic slots > (i.e. each scalar C-like struct), an entire struct slot can be set to null via the validity bitmap. To me this suggests that appending a sentinel value to the values buffer for a field is allowed, but not required. Am I understanding this correctly? > > Best, > Jorge > > > On Fri, Feb 18, 2022 at 7:07 PM Phillip Cloud <cpcl...@gmail.com> wrote: > > > My read of the spec for structs [1] is that there is no requirement to > have > > a value in child arrays where there are nulls, which suggests the > > implementation conforms to the spec here. > > > > The example emphasizes this by showing the VarBinary column data as > > "joemark" as opposed to something like "joe<garbage><garbage>mark". > > > > [1]: https://arrow.apache.org/docs/format/Columnar.html#struct-layout > > > > On Fri, Feb 18, 2022 at 12:53 PM Dominik Moritz <domor...@apache.org> > > wrote: > > > > > Can someone clarify whether the spec is clear about the behavior? > > > > > > On Feb 18, 2022 at 07:23:19, Alfie Mountfield <a...@hash.ai> wrote: > > > > > > > Hello all, > > > > I've raised a JIRA ticket ( > > > > https://issues.apache.org/jira/browse/ARROW-15705) > > > > for this, but I'm still uncertain on my reading of the spec so I > > thought > > > > I'd ask here to confirm I've understood it correctly. > > > > > > > > I believe that child arrays should always be the same length as the > > > struct > > > > array? It seems that in the JS implementation of Arrow though, if you > > > add a > > > > null value to a StructBuilder, it only modifies the null-bitmap and > > > doesn't > > > > actually try to append the null-value to the children arrays. I'm > > > guessing > > > > this is a bug. > > > > > > > > If so, is there anything I need to do to get the PR I've opened ( > > > > https://github.com/apache/arrow/pull/12451) in? > > > > > > > > Cheers, > > > > Alfie > > > > > > > > -- > > > > > > > > > > > > > > > > <https://facebook.com/hashintel> <https://twitter.com/hashintel> > > > > <https://www.linkedin.com/company/hashintel> * <http://hash.ai/>* > > > > > > > > > > > > *HASH, > > > > Inc. *is a Delaware-registered corporation. *HASH, Ltd.* is a UK > > > (England) > > > > registered company (No. 13003048). This message contains information > > > which > > > > may be confidential and privileged. Unless you are the intended > > recipient > > > > (or authorized to receive this message for the intended recipient), > you > > > > may > > > > not use, copy, disseminate or disclose to anyone the message or any > > > > information contained in the message. If you have received the > message > > in > > > > error, please advise the sender by reply e-mail, and delete the > > message. > > > > > > > > > > > > > > > > > > > > > >