[GitHub] [arrow] kkraus14 commented on a diff in pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

via GitHub Tue, 11 Apr 2023 15:01:55 -0700


kkraus14 commented on code in PR #34972:
URL: https://github.com/apache/arrow/pull/34972#discussion_r1163373562



##########
cpp/src/arrow/c/abi.h:
##########
@@ -106,6 +212,98 @@ struct ArrowArrayStream {
 
 #endif  // ARROW_C_STREAM_INTERFACE
 
+#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
+#define ARROW_C_DEVICE_STREAM_INTERFACE
+
+/// \brief Equivalent to ArrowArrayStream, but for ArrowDeviceArrays.
+///
+/// This stream is intended to provide a stream of data on a single
+/// device, if a producer wants data to be produced on multiple devices
+/// then multiple streams should be provided. One per device.
+struct ArrowDeviceArrayStream {
+  /// \brief The device that this stream produces data on.
+  ///
+  /// All ArrowDeviceArrays that are produced by this
+  /// stream should have the same device_type as set
+  /// here. The device_type needs to be provided here
+  /// so that consumers can provide the correct type
+  /// of queue_ptr when calling get_next.
+  ArrowDeviceType device_type;
+
+  /// \brief Callback to get the stream schema
+  /// (will be the same for all arrays in the stream).
+  ///
+  /// If successful, the ArrowSchema must be released independantly from the 
stream.
+  /// The schema should be accessible via CPU memory.
+  ///
+  /// \param[in] self The ArrowDeviceArrayStream object itself
+  /// \param[out] out C struct to export the schema to
+  /// \return 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_schema)(struct ArrowDeviceArrayStream* self, struct ArrowSchema* 
out);
+
+  /// \brief Callback to get the device id for the next array.
+  ///
+  /// This is necessary so that the proper/correct stream pointer can be 
provided
+  /// to get_next.
+  ///
+  /// The next call to `get_next` should provide an ArrowDeviceArray whose
+  /// device_id matches what is provided here, and whose device_type is the
+  /// same as the device_type member of this stream.

Review Comment:
   > Yes, I read this. It looks like solution S1, which is also the one I'm 
proposing, is considered the most flexible (I don't understand the "harder for 
compilers" comment, though).
   
   From: https://github.com/dmlc/dlpack/issues/57#issuecomment-751715168
   > It also brings extra burden to the compilers themselves. The compiler will 
need to generate optional synchronization code based on the streams, which is 
non-trivial.
   
   > But you have to actually synchronize on the right stream before being able 
to use the object, right?
   
   Something / someone needs to guarantee that there isn't a data race with 
regards to using multiple non-blocking streams, yes. That could be done with 
events, stream synchronization, or device synchronization.
   
   > How does the user know which stream to synchronize on, if they didn't 
produce the data themselves?
   
   If you're staying within your framework / library then the expectation is 
for the framework / library to handle things for the user. If crossing 
framework / library boundaries, then the expectation is to be reliant on things 
like interchange protocols to handle the synchronization semantics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] kkraus14 commented on a diff in pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

Reply via email to