westonpace commented on PR #43632:
URL: https://github.com/apache/arrow/pull/43632#issuecomment-2282026639
Ah, one more potential approach. This one is actually my favorite for a
file(s) scanner. It has all the advantages of a pull-based approach plus it
puts the consumer in complete control of deciding how much decode parallelism
there should be (by calling `next_task` before the previous call to `next_task`
has completed). On the other hand, it can make it more difficult if the
producer does not have any control over the parallelism or know the size of the
stream ahead-of-time (e.g. maybe it is receiving data from a TCP stream):
```
struct Waker {
// Signal to the producer that more data is available, the consumer shall
release any resources
// associated with the waker after this method is called. The producer
shall not call any other
// methods after calling this method.
//
// The producer must always call this method even if an error is
encountered (in which case the
// error will be reported on the following call to `get_next`).
void wake(Waker* waker);
void* private_data;
};
struct ArrowArrayTask {
// Callback to get the array
// (if no error and the array is released, the stream has ended)
//
// Return value: 0 if successful, an `errno`-compatible error code
otherwise.
//
// If EWOULDBLOCK is returned then the array is not ready. If the
// producer returns this value then the producer takes ownership of
`waker` and is responsible
// for calling `wake` when more data is available. If any other value is
returned then the producer
// shall ignore `waker` and never call any methods on it.
//
// If successful, the ArrowDeviceArray must be released independently from
the stream.
int (*get_next)(struct ArrowArrayTask* self, Waker* waker, struct
ArrowDeviceArray* out);
void (*release)(struct ArrowArrayTask* self);
void* ArrowArrayTask;
};
struct ArrowAsyncDeviceArrayStream {
// Callback to get the next array task
// (if no error and the task is released, the stream has ended)
//
// Return value: 0 if successful, an `errno`-compatible error code
otherwise.
//
// The consumer is allowed to call this method again even if the tasks
returned by a previous
// call have not completed. However, the consumer shall not call this
re-entrantly.
//
// If successful, the ArrowDeviceArray must be released independently from
the stream.
int (*get_next_task)(struct ArrowAsyncDeviceArrayStream* self, struct
ArrowArrayTask* task);
// The rest is identical to `ArrowDeviceArrayStream`
ArrowDeviceType device_type;
const char* (*get_last_error)(struct ArrowAsyncDeviceArrayStream* self);
void (*release)(struct ArrowAsyncDeviceArrayStream* self);
void* private_data;
};
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]