zeroshade commented on PR #43632:
URL: https://github.com/apache/arrow/pull/43632#issuecomment-2302898364

   I just want to consolidate the discussion here, please let me know if I 
missed anything or got anything wrong.
   
   ## Changes to the proposal
   
   * Either remove the `extension` parameter or change it to be `const char*` 
using the metadata encoding like we're doing for the `on_error` callback
   * Change `on_next` to instead be `get_next_task` and create an 
`ArrowArrayTask` struct that populates the `ArrowDeviceArray`
   * Do we want to go ahead with the `waker` approach?
   
   I'm unsure about the `ArrowArrayTask` approach since the majority of use 
cases we're going to have aren't going to know how many tasks there will be 
ahead of time (file readers are the only cases I can think of) so I agree with 
@lidavidm that I don't think we want to bake in an assumption of knowing how 
many tasks will be there ahead of time.
   
   Do we need the entirely separate task struct? Could 
`ArrowAsyncDeviceArrayStream` instead have the `get_next` directly take a 
`waker` and have the whole `EWOULDBLOCK` semantics for handling the stream? 
   
   My big concern is the case of an IPC stream where we don't know beforehand 
how many record batches are in the stream. If `get_next_task` is called 
multiple times in a row, historically we've used a return of 0 + null for the 
Array to indicate the end of the stream. So the only reasonable way to handle 
`get_next_task` being called multiple times in a row while there are tasks 
pending is to return a valid task for each call until the producer *knows* that 
the stream has ended and then we define semantics for the task struct to 
indicate that the stream has ended successfully.  
   
   Does that make sense?
   
   ## Next steps
   
   After confirming that everyone is on-board with the above ideas, I'll update 
this PR with the changes for more feedback.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to