[GitHub] [arrow] pitrou commented on a diff in pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

via GitHub Tue, 11 Apr 2023 14:32:16 -0700


pitrou commented on code in PR #34972:
URL: https://github.com/apache/arrow/pull/34972#discussion_r1163354415



##########
cpp/src/arrow/c/abi.h:
##########
@@ -106,6 +212,98 @@ struct ArrowArrayStream {
 
 #endif  // ARROW_C_STREAM_INTERFACE
 
+#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
+#define ARROW_C_DEVICE_STREAM_INTERFACE
+
+/// \brief Equivalent to ArrowArrayStream, but for ArrowDeviceArrays.
+///
+/// This stream is intended to provide a stream of data on a single
+/// device, if a producer wants data to be produced on multiple devices
+/// then multiple streams should be provided. One per device.
+struct ArrowDeviceArrayStream {
+  /// \brief The device that this stream produces data on.
+  ///
+  /// All ArrowDeviceArrays that are produced by this
+  /// stream should have the same device_type as set
+  /// here. The device_type needs to be provided here
+  /// so that consumers can provide the correct type
+  /// of queue_ptr when calling get_next.
+  ArrowDeviceType device_type;
+
+  /// \brief Callback to get the stream schema
+  /// (will be the same for all arrays in the stream).
+  ///
+  /// If successful, the ArrowSchema must be released independantly from the 
stream.
+  /// The schema should be accessible via CPU memory.
+  ///
+  /// \param[in] self The ArrowDeviceArrayStream object itself
+  /// \param[out] out C struct to export the schema to
+  /// \return 0 if successful, an `errno`-compatible error code otherwise.
+  int (*get_schema)(struct ArrowDeviceArrayStream* self, struct ArrowSchema* 
out);
+
+  /// \brief Callback to get the device id for the next array.
+  ///
+  /// This is necessary so that the proper/correct stream pointer can be 
provided
+  /// to get_next.
+  ///
+  /// The next call to `get_next` should provide an ArrowDeviceArray whose
+  /// device_id matches what is provided here, and whose device_type is the
+  /// same as the device_type member of this stream.

Review Comment:
   Thanks the pointers.
   
   > I think this comment does a good job of summarizing the options that were 
considered: [dmlc/dlpack#57 
(comment)](https://github.com/dmlc/dlpack/issues/57#issuecomment-753220425)
   
   Yes, I read this. It looks like solution S1, which is also the one I'm 
proposing, is considered the most flexible (I don't understand the "harder for 
compilers" comment, though).
   
   > And then this comment summarizes discussion of those options: 
[dmlc/dlpack#57 
(comment)](https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973)
   
   I read this too, but it doesn't actually mention S1, for reasons I wasn't 
able to understand.
   
   >  In most cases libraries don't associate a stream with the object since 
it's valid to use multiple streams with a single object.
   
   But you have to actually synchronize on the right stream before being able 
to use the object, right? How does the user know which stream to synchronize 
on, if they didn't produce the data themselves?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on a diff in pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

Reply via email to