zeroshade commented on PR #43632: URL: https://github.com/apache/arrow/pull/43632#issuecomment-2302898364
I just want to consolidate the discussion here, please let me know if I missed anything or got anything wrong. ## Changes to the proposal * Either remove the `extension` parameter or change it to be `const char*` using the metadata encoding like we're doing for the `on_error` callback * Change `on_next` to instead be `get_next_task` and create an `ArrowArrayTask` struct that populates the `ArrowDeviceArray` * Do we want to go ahead with the `waker` approach? I'm unsure about the `ArrowArrayTask` approach since the majority of use cases we're going to have aren't going to know how many tasks there will be ahead of time (file readers are the only cases I can think of) so I agree with @lidavidm that I don't think we want to bake in an assumption of knowing how many tasks will be there ahead of time. Do we need the entirely separate task struct? Could `ArrowAsyncDeviceArrayStream` instead have the `get_next` directly take a `waker` and have the whole `EWOULDBLOCK` semantics for handling the stream? My big concern is the case of an IPC stream where we don't know beforehand how many record batches are in the stream. If `get_next_task` is called multiple times in a row, historically we've used a return of 0 + null for the Array to indicate the end of the stream. So the only reasonable way to handle `get_next_task` being called multiple times in a row while there are tasks pending is to return a valid task for each call until the producer *knows* that the stream has ended and then we define semantics for the task struct to indicate that the stream has ended successfully. Does that make sense? ## Next steps After confirming that everyone is on-board with the above ideas, I'll update this PR with the changes for more feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
