felipecrv commented on code in PR #43632:
URL: https://github.com/apache/arrow/pull/43632#discussion_r1767114741
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
Review Comment:
Are these really "call-back" functions? I know the spec and code calls them
callbacks, but "producer functions" will be less confusing here.
```suggestion
API centered around the consumer calling the producer functions to retrieve
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
Review Comment:
What "producer-focused" means here?
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
Review Comment:
```suggestion
// consumer-specific handlers
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
Review Comment:
```suggestion
call and retrieve records, the Async interface is a structure allocated and
populated by the consumer.
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
+
+ A producer that receives a non-zero result here must not subsequently call
anything other than
+ the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)
+
+ *Mandatory.* Handler to be called when a new record is available for
processing. The
+ schema for each record should be the same as the schema that ``on_schema``
was called with.
+ If successfully handled, the function should return 0, otherwise it should
return an
+ ``errno``-compatible error code.
+
+ Rather than passing the record itself it receives an ``ArrowAsyncTask``
instead to facilitate
+ better consumer-focused thread control as far as receiving the data. A
call to this function
+ simply indicates that data is available via the provided task.
+
+ The producer signals the end of the stream by passing ``NULL`` for the
``ArrowAsyncTask``
+ pointer instead of a valid address. This task object is only valid during
the lifetime of
+ this function call. If the consumer wants to use the task beyond the scope
of this method, it
+ must copy or move its contents to a new ArrowAsyncTask object.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want. This is encoded in the same format as
:c:member:`ArrowSchema.metadata`. If not ``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ A producer *MUST NOT* call this concurrently from multiple threads.
+
+ The :c:member:`ArrowAsyncProducer.request` callback must be called to
start receiving calls to this
+ handler.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.on_error)(struct
ArrowAsyncDeviceStreamHandler, int, const char*, const char*)
+
+ *Mandatory.* Handler to be called when an error is encountered by the
producer. After calling
+ this, the ``release`` callback will be called as the last call on this
struct. The parameters
+ are an ``errno``-compatible error code and an optional error message and
metadata.
+
+ If the message and metadata are not ``NULL``, their lifetime is only valid
during the scope
+ of this call. A consumer who wants to maintain these values past the
return of this function
+ *MUST* copy the values themselves.
+
+ If the metadata parameter is not ``NULL``, to provide key-value error
metadata, then it should
+ be encoded identically to the way that metadata is encoded in
:c:member:`ArrowSchema.metadata`.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This callback *MUST NOT* call any
methods of an
+ ``ArrowAsyncProducer`` object.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.release)(struct
ArrowAsyncDeviceStreamHandler*)
+
+ *Mandatory.* A pointer to a consumer-provided release callback for the
handler.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This must not call any methods of
an ``ArrowAsyncProducer``
+ object.
+
+.. c:member:: void* ArrowAsyncDeviceStreamHandler.private_data
+
+ *Optional.* An opaque pointer to consumer-provided private data.
+
+ Producers *MUST NOT* process this member. Lifetime of this member is
handled by
+ the consumer, and especially by the release callback.
+
+The ArrowAsyncTask structure
+''''''''''''''''''''''''''''
+
+The purpose of using a Task object rather than passing the array directly to
the ``on_next``
+callback is to allow for more complex and efficient thread handling. Utilizing
a Task
+object allows for a producer to separate the "decoding" logic from the I/O,
enabling a
+consumer to avoid transferring data between CPU cores (e.g. from one L1/L2
cache to another).
+
+This producer-provided structure has the following fields:
+
+.. c:member:: int (*ArrowArrayTask.get_data)(struct ArrowArrayTask*, struct
ArrowDeviceArray*)
+
+ *Mandatory.* A callback to populate the provided ``ArrowDeviceArray`` with
the available data.
+ The order of ArrowAsyncTasks provided by the producer enables a consumer to
know the order of
+ the data to process. If the consumer does not care about the data that is
owned by this task,
+ it must still call ``get_data`` so that the producer can perform any
required cleanup. ``NULL``
+ should be passed as the device array pointer to indicate that the consumer
doesn't want the
+ actual data, letting the task perform necessary cleanup.
Review Comment:
Oh, it's not a pure function. Yet another reason to call this `consume_data`
instead of `get_data`.
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
+
+ A producer that receives a non-zero result here must not subsequently call
anything other than
+ the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)
+
+ *Mandatory.* Handler to be called when a new record is available for
processing. The
+ schema for each record should be the same as the schema that ``on_schema``
was called with.
+ If successfully handled, the function should return 0, otherwise it should
return an
+ ``errno``-compatible error code.
+
+ Rather than passing the record itself it receives an ``ArrowAsyncTask``
instead to facilitate
+ better consumer-focused thread control as far as receiving the data. A
call to this function
+ simply indicates that data is available via the provided task.
+
+ The producer signals the end of the stream by passing ``NULL`` for the
``ArrowAsyncTask``
+ pointer instead of a valid address. This task object is only valid during
the lifetime of
+ this function call. If the consumer wants to use the task beyond the scope
of this method, it
+ must copy or move its contents to a new ArrowAsyncTask object.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want. This is encoded in the same format as
:c:member:`ArrowSchema.metadata`. If not ``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ A producer *MUST NOT* call this concurrently from multiple threads.
+
+ The :c:member:`ArrowAsyncProducer.request` callback must be called to
start receiving calls to this
+ handler.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.on_error)(struct
ArrowAsyncDeviceStreamHandler, int, const char*, const char*)
+
+ *Mandatory.* Handler to be called when an error is encountered by the
producer. After calling
+ this, the ``release`` callback will be called as the last call on this
struct. The parameters
+ are an ``errno``-compatible error code and an optional error message and
metadata.
+
+ If the message and metadata are not ``NULL``, their lifetime is only valid
during the scope
+ of this call. A consumer who wants to maintain these values past the
return of this function
+ *MUST* copy the values themselves.
+
+ If the metadata parameter is not ``NULL``, to provide key-value error
metadata, then it should
+ be encoded identically to the way that metadata is encoded in
:c:member:`ArrowSchema.metadata`.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This callback *MUST NOT* call any
methods of an
+ ``ArrowAsyncProducer`` object.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.release)(struct
ArrowAsyncDeviceStreamHandler*)
+
+ *Mandatory.* A pointer to a consumer-provided release callback for the
handler.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This must not call any methods of
an ``ArrowAsyncProducer``
+ object.
+
+.. c:member:: void* ArrowAsyncDeviceStreamHandler.private_data
+
+ *Optional.* An opaque pointer to consumer-provided private data.
+
+ Producers *MUST NOT* process this member. Lifetime of this member is
handled by
+ the consumer, and especially by the release callback.
+
+The ArrowAsyncTask structure
+''''''''''''''''''''''''''''
+
+The purpose of using a Task object rather than passing the array directly to
the ``on_next``
+callback is to allow for more complex and efficient thread handling. Utilizing
a Task
+object allows for a producer to separate the "decoding" logic from the I/O,
enabling a
+consumer to avoid transferring data between CPU cores (e.g. from one L1/L2
cache to another).
+
+This producer-provided structure has the following fields:
+
+.. c:member:: int (*ArrowArrayTask.get_data)(struct ArrowArrayTask*, struct
ArrowDeviceArray*)
+
+ *Mandatory.* A callback to populate the provided ``ArrowDeviceArray`` with
the available data.
+ The order of ArrowAsyncTasks provided by the producer enables a consumer to
know the order of
+ the data to process. If the consumer does not care about the data that is
owned by this task,
+ it must still call ``get_data`` so that the producer can perform any
required cleanup. ``NULL``
+ should be passed as the device array pointer to indicate that the consumer
doesn't want the
+ actual data, letting the task perform necessary cleanup.
+
+ If a non-zero value is returned from this, it should be followed only by the
producer calling
+ the ``on_error`` callback of the ``ArrowAsyncDeviceStreamHandler``. Because
calling this method
Review Comment:
This assumes the error returned here can only originate from the process
that produced the task (i.e. the producer). Consuming the data in the task is
forced to be infallible.
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
+
+ A producer that receives a non-zero result here must not subsequently call
anything other than
+ the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)
+
+ *Mandatory.* Handler to be called when a new record is available for
processing. The
+ schema for each record should be the same as the schema that ``on_schema``
was called with.
+ If successfully handled, the function should return 0, otherwise it should
return an
+ ``errno``-compatible error code.
+
+ Rather than passing the record itself it receives an ``ArrowAsyncTask``
instead to facilitate
+ better consumer-focused thread control as far as receiving the data. A
call to this function
+ simply indicates that data is available via the provided task.
+
+ The producer signals the end of the stream by passing ``NULL`` for the
``ArrowAsyncTask``
+ pointer instead of a valid address. This task object is only valid during
the lifetime of
+ this function call. If the consumer wants to use the task beyond the scope
of this method, it
+ must copy or move its contents to a new ArrowAsyncTask object.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want. This is encoded in the same format as
:c:member:`ArrowSchema.metadata`. If not ``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ A producer *MUST NOT* call this concurrently from multiple threads.
+
+ The :c:member:`ArrowAsyncProducer.request` callback must be called to
start receiving calls to this
+ handler.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.on_error)(struct
ArrowAsyncDeviceStreamHandler, int, const char*, const char*)
+
+ *Mandatory.* Handler to be called when an error is encountered by the
producer. After calling
+ this, the ``release`` callback will be called as the last call on this
struct. The parameters
+ are an ``errno``-compatible error code and an optional error message and
metadata.
+
+ If the message and metadata are not ``NULL``, their lifetime is only valid
during the scope
+ of this call. A consumer who wants to maintain these values past the
return of this function
+ *MUST* copy the values themselves.
+
+ If the metadata parameter is not ``NULL``, to provide key-value error
metadata, then it should
+ be encoded identically to the way that metadata is encoded in
:c:member:`ArrowSchema.metadata`.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This callback *MUST NOT* call any
methods of an
+ ``ArrowAsyncProducer`` object.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.release)(struct
ArrowAsyncDeviceStreamHandler*)
+
+ *Mandatory.* A pointer to a consumer-provided release callback for the
handler.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This must not call any methods of
an ``ArrowAsyncProducer``
+ object.
+
+.. c:member:: void* ArrowAsyncDeviceStreamHandler.private_data
+
+ *Optional.* An opaque pointer to consumer-provided private data.
+
+ Producers *MUST NOT* process this member. Lifetime of this member is
handled by
+ the consumer, and especially by the release callback.
+
+The ArrowAsyncTask structure
+''''''''''''''''''''''''''''
+
+The purpose of using a Task object rather than passing the array directly to
the ``on_next``
+callback is to allow for more complex and efficient thread handling. Utilizing
a Task
+object allows for a producer to separate the "decoding" logic from the I/O,
enabling a
+consumer to avoid transferring data between CPU cores (e.g. from one L1/L2
cache to another).
+
+This producer-provided structure has the following fields:
+
+.. c:member:: int (*ArrowArrayTask.get_data)(struct ArrowArrayTask*, struct
ArrowDeviceArray*)
+
+ *Mandatory.* A callback to populate the provided ``ArrowDeviceArray`` with
the available data.
+ The order of ArrowAsyncTasks provided by the producer enables a consumer to
know the order of
+ the data to process. If the consumer does not care about the data that is
owned by this task,
+ it must still call ``get_data`` so that the producer can perform any
required cleanup. ``NULL``
+ should be passed as the device array pointer to indicate that the consumer
doesn't want the
+ actual data, letting the task perform necessary cleanup.
+
+ If a non-zero value is returned from this, it should be followed only by the
producer calling
+ the ``on_error`` callback of the ``ArrowAsyncDeviceStreamHandler``. Because
calling this method
+ is likely to be separate from the current control flow, returning a non-zero
value to signal
+ an error occuring allows the current thread to decide handle the case
accordingly, while still
+ allowing all error logging and handling to be centralized in the
+ :c:member:`ArrowAsyncDeviceStreamHandler.on_error` callback.
+
+ Rather than having a separate release callback, any required cleanup should
be performed as part
+ of the invocation of this callback. Ownership of the Array is given to the
pointer passed in as
+ a parameter, and this array must be released separately.
+
+ It is only valid to call this method exactly once.
+
+.. c:member:: void* ArrowArrayTask.private_data
+
+ *Optional.* An opaque pointer to producer-provided private data.
+
+ Consumers *MUST NOT* process this member. Lifetime of this member is handled
by
+ the producer who created this object, and should be cleaned up if necessary
during
+ the call to :c:member:`ArrowArrayTask.get_data`.
Review Comment:
```suggestion
Consumers *MUST NOT* process this member. Lifetime of this member is
initiated by
the producer when creating this object, and cleanup occurs during
the call to :c:member:`ArrowArrayTask.consume_data`.
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
Review Comment:
Link to `errno` specification of choice here maybe?
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
Review Comment:
"manage back-pressure" is synonym with "flow control in this context.
```suggestion
object when calling this function to allow the consumer to apply
back-pressure and control the flow of data.
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
Review Comment:
This repeats what was said on the first line of this paragraph.
```suggestion
when the schema and chunks of data are available.
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
Review Comment:
This paragraph is good as it states the problem before the solution, but it
leaves the goals somewhat implicit.
```suggestion
the next record batch. For concurrent communication between producer and
consumer,
the ``ArrowAsyncDeviceStreamHandler`` can be used. This interface is
non-opinionated
and may fit into different concurrent communication models.
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
Review Comment:
I see 3 structs :)
```suggestion
The C device async stream interface consists of three ``struct`` definitions:
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
Review Comment:
```suggestion
*Mandatory.* Handler for receiving the schema of the stream. All
incoming records should
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
Review Comment:
```suggestion
calling the release callback on the ``ArrowAsyncDeviceStreamHandler``.
```
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
+
+ A producer that receives a non-zero result here must not subsequently call
anything other than
+ the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)
+
+ *Mandatory.* Handler to be called when a new record is available for
processing. The
+ schema for each record should be the same as the schema that ``on_schema``
was called with.
+ If successfully handled, the function should return 0, otherwise it should
return an
+ ``errno``-compatible error code.
+
+ Rather than passing the record itself it receives an ``ArrowAsyncTask``
instead to facilitate
+ better consumer-focused thread control as far as receiving the data. A
call to this function
+ simply indicates that data is available via the provided task.
+
+ The producer signals the end of the stream by passing ``NULL`` for the
``ArrowAsyncTask``
+ pointer instead of a valid address. This task object is only valid during
the lifetime of
+ this function call. If the consumer wants to use the task beyond the scope
of this method, it
+ must copy or move its contents to a new ArrowAsyncTask object.
Review Comment:
Do you think it makes sense to change `get_data` to `consume_data` and
replace "must copy or move" with "must consume the data in the task"?
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
+
+ A producer that receives a non-zero result here must not subsequently call
anything other than
+ the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)
+
+ *Mandatory.* Handler to be called when a new record is available for
processing. The
+ schema for each record should be the same as the schema that ``on_schema``
was called with.
+ If successfully handled, the function should return 0, otherwise it should
return an
+ ``errno``-compatible error code.
+
+ Rather than passing the record itself it receives an ``ArrowAsyncTask``
instead to facilitate
+ better consumer-focused thread control as far as receiving the data. A
call to this function
+ simply indicates that data is available via the provided task.
+
+ The producer signals the end of the stream by passing ``NULL`` for the
``ArrowAsyncTask``
+ pointer instead of a valid address. This task object is only valid during
the lifetime of
+ this function call. If the consumer wants to use the task beyond the scope
of this method, it
+ must copy or move its contents to a new ArrowAsyncTask object.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want. This is encoded in the same format as
:c:member:`ArrowSchema.metadata`. If not ``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ A producer *MUST NOT* call this concurrently from multiple threads.
+
+ The :c:member:`ArrowAsyncProducer.request` callback must be called to
start receiving calls to this
+ handler.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.on_error)(struct
ArrowAsyncDeviceStreamHandler, int, const char*, const char*)
+
+ *Mandatory.* Handler to be called when an error is encountered by the
producer. After calling
+ this, the ``release`` callback will be called as the last call on this
struct. The parameters
+ are an ``errno``-compatible error code and an optional error message and
metadata.
+
+ If the message and metadata are not ``NULL``, their lifetime is only valid
during the scope
+ of this call. A consumer who wants to maintain these values past the
return of this function
+ *MUST* copy the values themselves.
+
+ If the metadata parameter is not ``NULL``, to provide key-value error
metadata, then it should
+ be encoded identically to the way that metadata is encoded in
:c:member:`ArrowSchema.metadata`.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This callback *MUST NOT* call any
methods of an
+ ``ArrowAsyncProducer`` object.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.release)(struct
ArrowAsyncDeviceStreamHandler*)
+
+ *Mandatory.* A pointer to a consumer-provided release callback for the
handler.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This must not call any methods of
an ``ArrowAsyncProducer``
+ object.
+
+.. c:member:: void* ArrowAsyncDeviceStreamHandler.private_data
+
+ *Optional.* An opaque pointer to consumer-provided private data.
+
+ Producers *MUST NOT* process this member. Lifetime of this member is
handled by
+ the consumer, and especially by the release callback.
+
+The ArrowAsyncTask structure
+''''''''''''''''''''''''''''
+
+The purpose of using a Task object rather than passing the array directly to
the ``on_next``
+callback is to allow for more complex and efficient thread handling. Utilizing
a Task
+object allows for a producer to separate the "decoding" logic from the I/O,
enabling a
+consumer to avoid transferring data between CPU cores (e.g. from one L1/L2
cache to another).
+
+This producer-provided structure has the following fields:
+
+.. c:member:: int (*ArrowArrayTask.get_data)(struct ArrowArrayTask*, struct
ArrowDeviceArray*)
+
+ *Mandatory.* A callback to populate the provided ``ArrowDeviceArray`` with
the available data.
+ The order of ArrowAsyncTasks provided by the producer enables a consumer to
know the order of
+ the data to process. If the consumer does not care about the data that is
owned by this task,
+ it must still call ``get_data`` so that the producer can perform any
required cleanup. ``NULL``
+ should be passed as the device array pointer to indicate that the consumer
doesn't want the
+ actual data, letting the task perform necessary cleanup.
+
+ If a non-zero value is returned from this, it should be followed only by the
producer calling
+ the ``on_error`` callback of the ``ArrowAsyncDeviceStreamHandler``. Because
calling this method
+ is likely to be separate from the current control flow, returning a non-zero
value to signal
+ an error occuring allows the current thread to decide handle the case
accordingly, while still
+ allowing all error logging and handling to be centralized in the
+ :c:member:`ArrowAsyncDeviceStreamHandler.on_error` callback.
+
+ Rather than having a separate release callback, any required cleanup should
be performed as part
+ of the invocation of this callback. Ownership of the Array is given to the
pointer passed in as
+ a parameter, and this array must be released separately.
+
+ It is only valid to call this method exactly once.
Review Comment:
`consume_data` it is.
##########
docs/source/format/CDeviceDataInterface.rst:
##########
@@ -650,6 +652,340 @@ The stream source is not assumed to be thread-safe.
Consumers wanting to
call ``get_next`` from several threads should ensure those calls are
serialized.
+Async Device Stream Interface
+=============================
+
+The :ref:`C stream interface <_c-device-stream-interface>` provides a
synchronous
+API centered around the consumer calling the callback functions to retrieve
+the next record batch. For some bindings, use cases, and interoperability, a
more
+asynchronous, producer-focused interface may be required. These scenarios can
utilize
+the ``ArrowAsyncDeviceStreamHandler``.
+
+Semantics
+---------
+
+Rather than the producer providing a structure of callbacks for a consumer to
+call and retrieve records, the Async interface is a consumer allocated
structure.
+The consumer allocated struct provides handler callbacks for the producer to
call
+when the schema and chunks of data are available, rather than the consumer
using
+a blocking pull-style iteration.
+
+In addition to the ``ArrowAsyncDeviceStreamHandler``, there are also two
additional
+structs used for the full data flow: ``ArrowAsyncTask`` and
``ArrowAsyncProducer``.
+
+Structure Definition
+--------------------
+
+The C device async stream interface is defined with a single ``struct``
definition:
+
+.. code-block:: c
+
+ #ifndef ARROW_C_ASYNC_STREAM_INTERFACE
+ #define ARROW_C_ASYNC_STREAM_INTERFACE
+
+ struct ArrowAsyncTask {
+ int (*get_data)(struct ArrowArrayTask* self, struct ArrowDeviceArray*
out);
+
+ void* private_data;
+ };
+
+ struct ArrowAsyncProducer {
+ void (*request)(struct ArrowAsyncProducer* self, int64_t n);
+ void (*cancel)(struct ArrowAsyncProducer* self);
+
+ void (*release)(struct ArrowAsyncProducer* self);
+ void* private_data;
+ };
+
+ struct ArrowAsyncDeviceStreamHandler {
+ // handlers
+ int (*on_schema)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncProducer* producer,
+ struct ArrowSchema* stream_schema, const char*
addl_metadata);
+ int (*on_next_task)(struct ArrowAsyncDeviceStreamHandler* self,
+ struct ArrowAsyncTask* task, const char* metadata);
+ void (*on_error)(struct ArrowAsyncDeviceStreamHandler* self,
+ int code, const char* message, const char* metadata);
+
+ // release callback
+ void (*release)(struct ArrowAsyncDeviceStreamHandler* self);
+
+ // opaque handler-specific data
+ void* private_data;
+ };
+
+ #endif // ARROW_C_ASYNC_STREAM_INTERFACE
+
+.. note::
+ The canonical guard ``ARROW_C_ASYNC_STREAM_INTERFACE`` is meant to avoid
+ duplicate definitions if two projects copy the C async stream interface
+ definitions into their own headers, and a third-party project includes
+ from these two projects. It is therefore important that this guard is kept
+ exactly as-is when these definitions are copied.
+
+The ArrowAsyncDeviceStreamHandler structure
+'''''''''''''''''''''''''''''''''''''''''''
+
+The structure has the following fields:
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_schema)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncProducer*, struct
ArrowSchema*, const char*)
+
+ *Mandatory.* Handler for receiving the schema of the stream. All records
should
+ match the provided schema. If successful, the function should return 0,
otherwise
+ it should return an ``errno``-compatible error code.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want, such as the total number of rows in the stream, statistics, or
otherwise. This is
+ encoded in the same format as :c:member:`ArrowSchema.metadata`. If not
``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ Unless the ``on_error`` handler is called, this will always get called
exactly once and will be
+ the first method called on this object. As such the producer *MUST*
provide an ``ArrowAsyncProducer``
+ object when calling this function to allow the consumer to manage
back-pressure and flow control.
+ The producer maintains ownership of the ``ArrowAsyncProducer`` and must
clean it up *after*
+ calling the release callback on this object.
+
+ A producer that receives a non-zero result here must not subsequently call
anything other than
+ the release callback on this object.
+
+.. c:member:: int (*ArrowAsyncDeviceStreamHandler.on_next_task)(struct
ArrowAsyncDeviceStreamHandler*, struct ArrowAsyncTask*, const char*)
+
+ *Mandatory.* Handler to be called when a new record is available for
processing. The
+ schema for each record should be the same as the schema that ``on_schema``
was called with.
+ If successfully handled, the function should return 0, otherwise it should
return an
+ ``errno``-compatible error code.
+
+ Rather than passing the record itself it receives an ``ArrowAsyncTask``
instead to facilitate
+ better consumer-focused thread control as far as receiving the data. A
call to this function
+ simply indicates that data is available via the provided task.
+
+ The producer signals the end of the stream by passing ``NULL`` for the
``ArrowAsyncTask``
+ pointer instead of a valid address. This task object is only valid during
the lifetime of
+ this function call. If the consumer wants to use the task beyond the scope
of this method, it
+ must copy or move its contents to a new ArrowAsyncTask object.
+
+ The ``const char*`` parameter exists for producers to provide any extra
contextual information
+ they want. This is encoded in the same format as
:c:member:`ArrowSchema.metadata`. If not ``NULL``,
+ the lifetime is only the scope of the call to this function. A consumer
who wants to maintain
+ the additional metadata beyond the lifetime of this call *MUST* copy the
value themselves.
+
+ A producer *MUST NOT* call this concurrently from multiple threads.
+
+ The :c:member:`ArrowAsyncProducer.request` callback must be called to
start receiving calls to this
+ handler.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.on_error)(struct
ArrowAsyncDeviceStreamHandler, int, const char*, const char*)
+
+ *Mandatory.* Handler to be called when an error is encountered by the
producer. After calling
+ this, the ``release`` callback will be called as the last call on this
struct. The parameters
+ are an ``errno``-compatible error code and an optional error message and
metadata.
+
+ If the message and metadata are not ``NULL``, their lifetime is only valid
during the scope
+ of this call. A consumer who wants to maintain these values past the
return of this function
+ *MUST* copy the values themselves.
+
+ If the metadata parameter is not ``NULL``, to provide key-value error
metadata, then it should
+ be encoded identically to the way that metadata is encoded in
:c:member:`ArrowSchema.metadata`.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This callback *MUST NOT* call any
methods of an
+ ``ArrowAsyncProducer`` object.
+
+.. c:member:: void (*ArrowAsyncDeviceStreamHandler.release)(struct
ArrowAsyncDeviceStreamHandler*)
+
+ *Mandatory.* A pointer to a consumer-provided release callback for the
handler.
+
+ It is valid for this to be called by a producer with or without a
preceding call to
+ :c:member:`ArrowAsyncProducer.request`. This must not call any methods of
an ``ArrowAsyncProducer``
+ object.
+
+.. c:member:: void* ArrowAsyncDeviceStreamHandler.private_data
+
+ *Optional.* An opaque pointer to consumer-provided private data.
+
+ Producers *MUST NOT* process this member. Lifetime of this member is
handled by
+ the consumer, and especially by the release callback.
+
+The ArrowAsyncTask structure
+''''''''''''''''''''''''''''
+
+The purpose of using a Task object rather than passing the array directly to
the ``on_next``
+callback is to allow for more complex and efficient thread handling. Utilizing
a Task
+object allows for a producer to separate the "decoding" logic from the I/O,
enabling a
+consumer to avoid transferring data between CPU cores (e.g. from one L1/L2
cache to another).
+
+This producer-provided structure has the following fields:
+
+.. c:member:: int (*ArrowArrayTask.get_data)(struct ArrowArrayTask*, struct
ArrowDeviceArray*)
+
+ *Mandatory.* A callback to populate the provided ``ArrowDeviceArray`` with
the available data.
+ The order of ArrowAsyncTasks provided by the producer enables a consumer to
know the order of
+ the data to process. If the consumer does not care about the data that is
owned by this task,
+ it must still call ``get_data`` so that the producer can perform any
required cleanup. ``NULL``
+ should be passed as the device array pointer to indicate that the consumer
doesn't want the
+ actual data, letting the task perform necessary cleanup.
+
+ If a non-zero value is returned from this, it should be followed only by the
producer calling
+ the ``on_error`` callback of the ``ArrowAsyncDeviceStreamHandler``. Because
calling this method
+ is likely to be separate from the current control flow, returning a non-zero
value to signal
+ an error occuring allows the current thread to decide handle the case
accordingly, while still
+ allowing all error logging and handling to be centralized in the
+ :c:member:`ArrowAsyncDeviceStreamHandler.on_error` callback.
+
+ Rather than having a separate release callback, any required cleanup should
be performed as part
+ of the invocation of this callback. Ownership of the Array is given to the
pointer passed in as
+ a parameter, and this array must be released separately.
Review Comment:
Ok. Now this must definitely be renamed to `consume_data` as I suspected
earlier.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]