On Mon, Apr 6, 2020 at 9:57 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hello Todd,
>
> Le 06/04/2020 à 18:18, Todd Lipcon a écrit :
> >
> > I had a couple questions / items that should be clarified in the spec.
> Wes
> > suggested I raise them here on dev@:
> >
> > *1) Should producers expect callers to zero-init structs?*
>
> IMO, they shouldn't.  They should fill the structure exhaustively.
> Though ultimately it's a decision made by the implementer of the
> producer API.
>
> > I suppose since it's the "C interface" it's
> > probably best to follow the C-style "producer assumes the argument
> contains
> > uninitialized memory" convention.
>
> Exactly.
>
> > *2) Clarify lifetime semantics for nested structures*
> >
> > In my application, i'd like to allocate all of the children structures of
> > an ArrowSchema or ArrowArray out of a memory pool which is stored in the
> > private_data field of the top-level struct. As such, my initial
> > implementation was to make the 'release' callback on the top-level struct
> > delete that memory pool, and set the 'release' callback of all children
> > structs to null, since their memory was totally owned by the top-level
> > struct.
> >
> > I figured this approach was OK because the spec says:
> >
> >>  Consumers MUST call a base structure's release callback when they won't
> > be using it anymore, but they MUST not call any of its children's release
> > callbacks (including the optional dictionary). The producer is
> responsible
> > for releasing the children.
> >
> > That advice seems to indicate that I can do whatever I want with the
> > release callback of the children, including not setting it.
>
> ... Except that in this case, moving a child wouldn't be possible.
>
> > This section of the spec also seems to be a bit in conflict with the
> > following:
> >
> >> It is possible to move a child array, but the parent array MUST be
> > released immediately afterwards, as it won't point to a valid child array
> > anymore. This satisfies the use case of keeping only a subset of child
> > arrays, while releasing the others.
> >
> > ... because if you have a parent array which owns the memory referred to
> by
> > the child, then moving the child (with a no-op release callback) followed
> > by releasing the parent, you'll end up with an invalid or deallocated
> child
> > as well.
>
> I think the solution here is for your memory pool to be
> reference-counted, and have each release callback in the array tree
> decrement the reference count.  Does that sound reasonable to you?
>

Sure, I can do that. But I imagine consumers may also be a bit surprised by
this behavior, that releasing the children doesn't actually free up any
memory.

The spec should also probably cover thread-safety: if the consumer gets an
ArrowArray, is it safe to pass off the children to multiple threads and
have them call release() concurrently? In other words, do I need to use a
thread-safe reference count? I would guess so.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to