kkraus14 commented on code in PR #34972:
URL: https://github.com/apache/arrow/pull/34972#discussion_r1163373562
##########
cpp/src/arrow/c/abi.h:
##########
@@ -106,6 +212,98 @@ struct ArrowArrayStream {
#endif // ARROW_C_STREAM_INTERFACE
+#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
+#define ARROW_C_DEVICE_STREAM_INTERFACE
+
+/// \brief Equivalent to ArrowArrayStream, but for ArrowDeviceArrays.
+///
+/// This stream is intended to provide a stream of data on a single
+/// device, if a producer wants data to be produced on multiple devices
+/// then multiple streams should be provided. One per device.
+struct ArrowDeviceArrayStream {
+ /// \brief The device that this stream produces data on.
+ ///
+ /// All ArrowDeviceArrays that are produced by this
+ /// stream should have the same device_type as set
+ /// here. The device_type needs to be provided here
+ /// so that consumers can provide the correct type
+ /// of queue_ptr when calling get_next.
+ ArrowDeviceType device_type;
+
+ /// \brief Callback to get the stream schema
+ /// (will be the same for all arrays in the stream).
+ ///
+ /// If successful, the ArrowSchema must be released independantly from the
stream.
+ /// The schema should be accessible via CPU memory.
+ ///
+ /// \param[in] self The ArrowDeviceArrayStream object itself
+ /// \param[out] out C struct to export the schema to
+ /// \return 0 if successful, an `errno`-compatible error code otherwise.
+ int (*get_schema)(struct ArrowDeviceArrayStream* self, struct ArrowSchema*
out);
+
+ /// \brief Callback to get the device id for the next array.
+ ///
+ /// This is necessary so that the proper/correct stream pointer can be
provided
+ /// to get_next.
+ ///
+ /// The next call to `get_next` should provide an ArrowDeviceArray whose
+ /// device_id matches what is provided here, and whose device_type is the
+ /// same as the device_type member of this stream.
Review Comment:
> Yes, I read this. It looks like solution S1, which is also the one I'm
proposing, is considered the most flexible (I don't understand the "harder for
compilers" comment, though).
From: https://github.com/dmlc/dlpack/issues/57#issuecomment-751715168
> It also brings extra burden to the compilers themselves. The compiler will
need to generate optional synchronization code based on the streams, which is
non-trivial.
> But you have to actually synchronize on the right stream before being able
to use the object, right?
Something / someone needs to guarantee that there isn't a data race with
regards to using multiple non-blocking streams, yes. That could be done with
events, stream synchronization, or device synchronization.
> How does the user know which stream to synchronize on, if they didn't
produce the data themselves?
If you're staying within your framework / library then the expectation is
for the framework / library to handle things for the user. If crossing
framework / library boundaries, then the expectation is to be reliant on things
like interchange protocols to handle the synchronization semantics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]