zeroshade commented on issue #811: URL: https://github.com/apache/arrow-adbc/issues/811#issuecomment-2123708185
> int being basically bool here? Like @paleolimbot said, it would be errno like the C Data Interface callbacks > For cancellation, we'll continue to use the existing method? Yup. The callback can then be called using `ADBC_STATUS_CANCELLED` > I gather that ArrowDeviceArray has different allocation semantics than ArrowArray? I guess I need to take a closer look at it. `ArrowDeviceArray` is a struct containing an `ArrowArray` along with a `device_type`, `device_id`, and a pointer that can be non-null if the data is on a device that requires synchronization (such as a GPU). > Any contract around releasing the ArrowDeviceArrays and ArrowSchema? What if the consumer hasn't released them yet when the producer releases the AsyncArrowStream? `ArrowDeviceArray` and `ArrowSchema` have their own release callbacks. A producer calling the release callback on the `AsyncArrowStream` shouldn't have any effect on the ability to release the `ArrowDeviceArray` or `ArrowSchema`. Their lifetimes should be tied to their own release callbacks. > A minor modification of Matt's proposal for the sake of argument: ... Is the inclusion of a device type there for the caller to tell the producer what device it wants the results to be allocated on? If the producer can't comply with that, does it then call `on_error`? In your suggested interface, are the semantics for `on_error` that the producer calls it in either of the following scenarios: * Producer encountered an error asynchronously and thus calls `on_error` instead of calling `on_schema` or `on_next` * Consumer returned a non-zero errno from `on_schema` or `on_next`, so producer calls `on_error` with the returned error code Are there other scenarios I'm missing? > > I think this would be a bit of a pain to manage. Why not add rows_affected to the on_schema call? > +1 That works. I'm fine with that. Though we'd have to go the route that @lidavidm suggested if we're pushing this upstream into the C Data Interface ABI directly of the `void*` for extensions. I personally am not a big fan of the idea of making a top-level getter that we would pass the `ArrowArrayStream` to in order to retrieve the row count, etc. > // Metadata here would be identical to ArrowSchema::metadata but the handler method > // would need to copy it (and message) if it needs to live after the handler function returns The ADBC Error struct also contains other information such as an optional opaque binary blob for metadata values and such, which would not be able to be passed through this interface. So we'd probably need to expand this `on_error` to also have a `void*` or pass a `char*` + length, or some other combination to allow passing binary pieces of error details. I agree that it might be easier in general if we push this upstream to the Arrow C ABI, but there are also benefits to being able to use the ADBC specific objects in the async interface too. If we do push upstream with this, would we want Async versions of both `ArrowArrayStream` AND `ArrowDeviceArrayStream`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
