pitrou commented on code in PR #43632:
URL: https://github.com/apache/arrow/pull/43632#discussion_r1757469760


##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,331 @@ The stream source is not assumed to be thread-safe. 
Consumers wanting to
 call ``get_next`` from several threads should ensure those calls are
 serialized.
 
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a 
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a 
more
+asynchronous, producer-focused interface may be required. These scenarios can 
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated 
structure.
+The consumer allocated struct provides handler callbacks for the producer to 
call
+when the schema and chunks of data are available, rather than the consumer 
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two 
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and 
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct`` 
definition:
+
+.. code-block:: c
+
+    #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+    #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+    struct ArrowAsyncTask {
+      int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray* 
out);
+
+      void* private_data;
+    };
+
+    struct ArrowAsyncProducer {
+      void (*request)(struct ArrowAsyncProducer* self, uint64_t n);
+      void (*cancel)(struct ArrowAsyncProducer* self);
+
+      void (*release)(struct ArrowAsyncProducer* self);
+      void* private_data;
+    };
+
+    struct ArrowAsyncDeviceStreamHandler {
+      // handlers
+      int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+                       struct ArrowAsyncProducer* producer,
+                       struct ArrowSchema* stream_schema, const char* 
addl_metadata);
+      int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+                          struct ArrowAsyncTask* task, const char* metadata);
+      void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+                       int code, const char* message, const char* metadata);
+
+      // release callback
+      void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+      // opaque handler-specific data
+      void* private_data;
+    };
+
+    #endif  // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+    The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+    duplicate definitions if two projects copy the C async stream interface
+    definitions into their own headers, and a third-party project includes
+    from these two projects. It is therefore important that this guard is kept
+    exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct 
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct 
ArrowSchema*, const char*)
+
+    *Mandatory.* Handler for receiving the schema of the stream. All records 
should
+    match the provided schema. If successful, the function should return 0, 
otherwise
+    it should return an ``errno``-compatible error code.
+
+    The ``const char*`` parameter exists for producers to provide any extra 
contextual information
+    they want, such as the total number of rows in the stream, statistics, or 
otherwise. This is
+    encoded in the same format as :c:member:`ArrowSchema.metadata`. If not 
``NULL``,
+    the lifetime is only the scope of the call to this function. A consumer 
who wants to maintain
+    the additional metadata beyond the lifetime of this call *MUST* copy the 
value themselves.
+
+    Unless the ``on_error`` handler is called, this will always get called 
exactly once and will be
+    the first method called on this object. As such the producer *MUST* 
provide an ``ArrowAsyncProducer``
+    object when calling this function to allow the consumer to manage 
back-pressure and flow control.
+    The producer maintains ownership of the ``ArrowAsyncProducer`` and must 
clean it up before or after
+    calling the release callback on this object.
+
+    A producer that receives a non-zero result here must not subsequently call 
anything other than
+    the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct 
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)

Review Comment:
   Why is it called a "task"? It's an array, so perhaps `on_next_array`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to