Re: [I] c: define async version of ArrowArrayStream [arrow-adbc]

via GitHub Tue, 21 May 2024 18:28:25 -0700


zeroshade commented on issue #811:
URL: https://github.com/apache/arrow-adbc/issues/811#issuecomment-2123708185


   > int being basically bool here?
   
   Like @paleolimbot said, it would be errno like the C Data Interface callbacks
   
   > For cancellation, we'll continue to use the existing method?
   
   Yup. The callback can then be called using `ADBC_STATUS_CANCELLED`
   
   > I gather that ArrowDeviceArray has different allocation semantics than 
ArrowArray? I guess I need to take a closer look at it.
   
   `ArrowDeviceArray` is a struct containing an `ArrowArray` along with a 
`device_type`, `device_id`, and a pointer that can be non-null if the data is 
on a device that requires synchronization (such as a GPU). 
   
   > Any contract around releasing the ArrowDeviceArrays and ArrowSchema? What 
if the consumer hasn't released them yet when the producer releases the 
AsyncArrowStream?
   
   `ArrowDeviceArray` and `ArrowSchema` have their own release callbacks. A 
producer calling the release callback on the `AsyncArrowStream` shouldn't have 
any effect on the ability to release the `ArrowDeviceArray` or `ArrowSchema`. 
Their lifetimes should be tied to their own release callbacks.
   
   > A minor modification of Matt's proposal for the sake of argument: ...
   
   Is the inclusion of a device type there for the caller to tell the producer 
what device it wants the results to be allocated on? If the producer can't 
comply with that, does it then call `on_error`?
   
   In your suggested interface, are the semantics for `on_error` that the 
producer calls it in either of the following scenarios:
   
   * Producer encountered an error asynchronously and thus calls `on_error` 
instead of calling `on_schema` or `on_next`
   * Consumer returned a non-zero errno from `on_schema` or `on_next`, so 
producer calls `on_error` with the returned error code
   
   Are there other scenarios I'm missing?
   
   > > I think this would be a bit of a pain to manage. Why not add 
rows_affected to the on_schema call?
   > +1
   
   That works. I'm fine with that. Though we'd have to go the route that 
@lidavidm suggested if we're pushing this upstream into the C Data Interface 
ABI directly of the `void*` for extensions. I personally am not a big fan of 
the idea of making a top-level getter that we would pass the `ArrowArrayStream` 
to in order to retrieve the row count, etc.
   
   > // Metadata here would be identical to ArrowSchema::metadata but the 
handler method
   > // would need to copy it (and message) if it needs to live after the 
handler function returns
   
   The ADBC Error struct also contains other information such as an optional 
opaque binary blob for metadata values and such, which would not be able to be 
passed through this interface. So we'd probably need to expand this `on_error` 
to also have a `void*` or pass a `char*` + length, or some other combination to 
allow passing binary pieces of error details.
   
   I agree that it might be easier in general if we push this upstream to the 
Arrow C ABI, but there are also benefits to being able to use the ADBC specific 
objects in the async interface too. If we do push upstream with this, would we 
want Async versions of both `ArrowArrayStream` AND `ArrowDeviceArrayStream`?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] c: define async version of ArrowArrayStream [arrow-adbc]

Reply via email to