zeroshade commented on PR #43632:
URL: https://github.com/apache/arrow/pull/43632#issuecomment-2284467244
@westonpace your last example confuses me a bit, particularly because the
entire purpose of this is to create a *push-based* approach for async handling,
rather than a pull-based approach.
Below are my questions:
```c
struct Waker {
// Signal to the producer that more data is available, the consumer shall
release any resources
// associated with the waker after this method is called. The producer
shall not call any other
// methods after calling this method.
//
// The producer must always call this method even if an error is
encountered (in which case the
// error will be reported on the following call to `get_next`).
void wake(Waker* waker);
void* private_data;
};
```
I don't understand signalling to the producer that data is available, the
producer knows when data is available right? it's producing the data in the
first place. Isn't the point that the producer needs to signal the *consumer*
that more data is available? `Waker` would need to be consumer created, with
the producer calling `wake` when data is available. Was your comment just a
typo, or am I missing something?
```c
struct ArrowAsyncDeviceArrayStream {
// Callback to get the next array task
// (if no error and the task is released, the stream has ended)
//
// Return value: 0 if successful, an `errno`-compatible error code
otherwise.
//
// The consumer is allowed to call this method again even if the tasks
returned by a previous
// call have not completed. However, the consumer shall not call this
re-entrantly.
//
// If successful, the ArrowDeviceArray must be released independently from
the stream.
int (*get_next_task)(struct ArrowAsyncDeviceArrayStream* self, struct
ArrowArrayTask* task);
// The rest is identical to `ArrowDeviceArrayStream`
ArrowDeviceType device_type;
const char* (*get_last_error)(struct ArrowAsyncDeviceArrayStream* self);
void (*release)(struct ArrowAsyncDeviceArrayStream* self);
void* private_data;
};
```
Why the abstraction of a task in this case? Couldn't we remove the task
abstraction and just put the `Waker` here? It feels like the `get_next_task`
abstraction here is just a way to re-invert the direction of the calls.
Producer calls `get_next_task` which turns around and calls `task->get_next`?
At that point, it feels like we could bypass the idea of needing
`ArrowAsyncDeviceArrayStream` to be constructed by the consumer and instead
build the entire API around the waker?
Also in the case of this being able to handle concurrency, the
`get_last_error` function becomes mostly meaningless, right? Depending on when
it is called, you end up with a race condition as far as what the error might
be.
Finally, If a consumer can call `get_next_task` before the previous one has
completed, what should happen in the following scenario:
* There is only one record in the stream, it is delayed
* Consumer calls `get_next_task` 5 times in a loop to get the next 5 tasks
Do all 5 calls from the consumer wait until the record comes in? Does each
one provide its own task object which then turns around and waits until the
record comes in to call wake? The producer then needs to manage the order the
calls came in, along with the order that the records come in for the calls to
`get_next`. This seems like a lot of added complexity without much benefit.
All in all, i'm not completely sold on the idea of this re-inversion of the
paradigm. It honestly feels sorta hacky, but that might just be me.
@paleolimbot @lidavidm thoughts on @westonpace's suggestion here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]