On Fri, Feb 18, 2022 at 2:32 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 18/02/2022 à 20:26, Phillip Cloud a écrit :
> > On Fri, Feb 18, 2022 at 2:06 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> >> Le 18/02/2022 à 20:01, Phillip Cloud a écrit :
> >>> I think I'm confused by where this appended value lives. Is it only a
> >>> logical value or does the value show up in memory?
> >>
> >> The logical value is null.  The appended value is only a physical value
> >> that shows up in memory but doesn't have any bearing on the logical
> value.
> >>
> >
> > Yes, but where does that value reside? Does it depend on the array type?
> Is
> > it garbage in the values buffer? Something else?
>
> Well, it obviously depends on the child array type, so it's difficult to
> answer more precisely.
>
> For example, if your child array is a fixed-width primitive array, then
> you can append a value of the given width, with whatever value.  You can
> also append a null in the child, but you still have to append to the
> values buffer anyway (since it's a fixed-width type).
>

Let's constrain it to the example in the spec.


>
> >>> For example, appending another null to the name field is only going to
> >>> change the validity map, offsets array and length and there will not be
> >> any
> >>> changes the values buffer.
> >>
> >> I may be missing some context, but what is the "name field" here?
> >>
> >
> > The field in the example in the spec:
> > https://arrow.apache.org/docs/format/Columnar.html#struct-layout
> >
>  > [...]
> >
> > If that were the case, I would expect garbage in between "joe" and "mark"
> > in the values array
> > from the example (the garbage being the physical value not having any
> > bearing on the logical value).
>
> Let's stop talking about "garbage", which is not a technically
> meaningful term.
>
> In this example, the child array is ["joe", null, null, "mark"], but it
> could also have been ["joe", null, "", mark] or even ["joe", null,
> "whatever", "mark"].  The important point being that the value #2 in the
> child array is masked by the corresponding null bit in the parent struct
> array.
>

I am really struggling to see how anything I've said is inconsistent with
the spec or what you are saying here.

To recap what I've said:

1. Appending a null sentinel to the values buffer isn't _required_ unless
the type requires it.
Ex: "joemark" in the spec example. No sentinels were append for the two
null values in the parent struct array.

2. Appending a null value sentinel is _allowed_ to be there if the type
does not require it.
Ex: "joefoofoomark" extending the spec example, assuming the other
associated buffers (validity, offsets) are correctly constructed.

Is either of those statements incorrect?


>
> Regards
>
> Antoine.
>

Reply via email to