On Mon, Apr 6, 2020 at 9:57 AM Antoine Pitrou <anto...@python.org> wrote:
> > Hello Todd, > > Le 06/04/2020 à 18:18, Todd Lipcon a écrit : > > > > I had a couple questions / items that should be clarified in the spec. > Wes > > suggested I raise them here on dev@: > > > > *1) Should producers expect callers to zero-init structs?* > > IMO, they shouldn't. They should fill the structure exhaustively. > Though ultimately it's a decision made by the implementer of the > producer API. > > > I suppose since it's the "C interface" it's > > probably best to follow the C-style "producer assumes the argument > contains > > uninitialized memory" convention. > > Exactly. > > > *2) Clarify lifetime semantics for nested structures* > > > > In my application, i'd like to allocate all of the children structures of > > an ArrowSchema or ArrowArray out of a memory pool which is stored in the > > private_data field of the top-level struct. As such, my initial > > implementation was to make the 'release' callback on the top-level struct > > delete that memory pool, and set the 'release' callback of all children > > structs to null, since their memory was totally owned by the top-level > > struct. > > > > I figured this approach was OK because the spec says: > > > >> Consumers MUST call a base structure's release callback when they won't > > be using it anymore, but they MUST not call any of its children's release > > callbacks (including the optional dictionary). The producer is > responsible > > for releasing the children. > > > > That advice seems to indicate that I can do whatever I want with the > > release callback of the children, including not setting it. > > ... Except that in this case, moving a child wouldn't be possible. > > > This section of the spec also seems to be a bit in conflict with the > > following: > > > >> It is possible to move a child array, but the parent array MUST be > > released immediately afterwards, as it won't point to a valid child array > > anymore. This satisfies the use case of keeping only a subset of child > > arrays, while releasing the others. > > > > ... because if you have a parent array which owns the memory referred to > by > > the child, then moving the child (with a no-op release callback) followed > > by releasing the parent, you'll end up with an invalid or deallocated > child > > as well. > > I think the solution here is for your memory pool to be > reference-counted, and have each release callback in the array tree > decrement the reference count. Does that sound reasonable to you? > Sure, I can do that. But I imagine consumers may also be a bit surprised by this behavior, that releasing the children doesn't actually free up any memory. The spec should also probably cover thread-safety: if the consumer gets an ArrowArray, is it safe to pass off the children to multiple threads and have them call release() concurrently? In other words, do I need to use a thread-safe reference count? I would guess so. -Todd -- Todd Lipcon Software Engineer, Cloudera