On Fri, Feb 18, 2022 at 2:32 PM Antoine Pitrou <anto...@python.org> wrote:
> > Le 18/02/2022 à 20:26, Phillip Cloud a écrit : > > On Fri, Feb 18, 2022 at 2:06 PM Antoine Pitrou <anto...@python.org> > wrote: > > > >> Le 18/02/2022 à 20:01, Phillip Cloud a écrit : > >>> I think I'm confused by where this appended value lives. Is it only a > >>> logical value or does the value show up in memory? > >> > >> The logical value is null. The appended value is only a physical value > >> that shows up in memory but doesn't have any bearing on the logical > value. > >> > > > > Yes, but where does that value reside? Does it depend on the array type? > Is > > it garbage in the values buffer? Something else? > > Well, it obviously depends on the child array type, so it's difficult to > answer more precisely. > > For example, if your child array is a fixed-width primitive array, then > you can append a value of the given width, with whatever value. You can > also append a null in the child, but you still have to append to the > values buffer anyway (since it's a fixed-width type). > Let's constrain it to the example in the spec. > > >>> For example, appending another null to the name field is only going to > >>> change the validity map, offsets array and length and there will not be > >> any > >>> changes the values buffer. > >> > >> I may be missing some context, but what is the "name field" here? > >> > > > > The field in the example in the spec: > > https://arrow.apache.org/docs/format/Columnar.html#struct-layout > > > > [...] > > > > If that were the case, I would expect garbage in between "joe" and "mark" > > in the values array > > from the example (the garbage being the physical value not having any > > bearing on the logical value). > > Let's stop talking about "garbage", which is not a technically > meaningful term. > > In this example, the child array is ["joe", null, null, "mark"], but it > could also have been ["joe", null, "", mark] or even ["joe", null, > "whatever", "mark"]. The important point being that the value #2 in the > child array is masked by the corresponding null bit in the parent struct > array. > I am really struggling to see how anything I've said is inconsistent with the spec or what you are saying here. To recap what I've said: 1. Appending a null sentinel to the values buffer isn't _required_ unless the type requires it. Ex: "joemark" in the spec example. No sentinels were append for the two null values in the parent struct array. 2. Appending a null value sentinel is _allowed_ to be there if the type does not require it. Ex: "joefoofoomark" extending the spec example, assuming the other associated buffers (validity, offsets) are correctly constructed. Is either of those statements incorrect? > > Regards > > Antoine. >