Re: [I] c: define async version of ArrowArrayStream [arrow-adbc]

via GitHub Tue, 21 May 2024 13:40:18 -0700


zeroshade commented on issue #811:
URL: https://github.com/apache/arrow-adbc/issues/811#issuecomment-2123404768


   @CurtHagenlocher @lidavidm What do you two think about the following idea:
   
   ```c++
   struct AsyncArrowStream {
       int (*on_schema)(struct AsyncArrowStream* self, struct ArrowSchema* out,
                                           AdbcStatusCode status, struct 
AdbcError* error);
       int (*on_next)(struct AsyncArrowStream* self, struct ArrowDeviceArray* 
out,
                                   AdbcStatusCode status, struct AdbcError* 
error);
   
       void (*release)(struct AsyncArrowSTream* self);
       void* private_data;
   };
   ```
   
   Which would be used like:
   
   ```c++
   AdbcStatusCode AdbcStatementExecuteQuery(struct AdbcStatement* statement,
                                                                                
        struct AsyncArrowStream* stream_handler,
                                                                                
        int64_t* rows_affected, struct AdbcError* sync_error);
   ```
   
   The caller would populate the `AsyncArrowStream`'s callbacks. `on_schema` 
would be called as soon as the schema is available, with calls to on_next as 
each record batch is available. Semantically:
   
   * `private_data` should be populated by the caller with any contextual 
information that is needed by the async callbacks.
   * `sync_error` is populated if a synchronous error happens before any 
asynchronous operations have begun.
   * If an error is encountered asynchronously trying to get the schema, then 
the status code and error are populated to call `on_schema` with a `nullptr` 
for the `ArrowSchema`. `on_next` will not be called in this scenario.
   * `rows_affected` should be populated if available before the call to 
`on_schema`.
   * If an error is encountered retrieving data, `on_next` is called with the 
error and status code and `nullptr` for the `ArrowDeviceArray`.
   * To signal the end of the stream, `on_next` is called with `ADBC_STATUS_OK` 
and a `nullptr` for the `ArrowDeviceArray`.
   * the async callbacks return int rather than void so that the callbacks can 
indicate whether an error was encountered on their end and that the producer 
should cancel/stop calling callback methods.
   
   The following rules should be observed by drivers:
   
   * The async callback for one call should complete before the next async 
callback is called, avoiding potential race conditions on a single result 
stream.
   * Once the last callback in a stream completes, (`on_schema` or `on_next`) 
the producer should then call `release`.
   
   This would work for any and all of the cases that work with 
ArrowArrayStreams. For the other scenarios, (`ExecuteUpdate`, `GetTableSchema`, 
etc.) something closer to what @CurtHagenlocher was suggesting with an 
`ArrowAsyncInfo` type object might be more useful maybe?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] c: define async version of ArrowArrayStream [arrow-adbc]

Reply via email to