lidavidm commented on issue #33986:
URL: https://github.com/apache/arrow/issues/33986#issuecomment-1412848157
It sounds like what you basically want is a
```c
struct DataFrameProducer {
int read(struct DataFrameProducer* self, ??? filters, ??? selection,
struct ArrowArrayStream* out);
};
```
with corresponding wrappers/carriers in Python, Rust, Go, etc. Then this can
be fed into DuckDB, Ballista, Acero, etc. and can be produced by ADBC, Acero,
DuckDB, etc.
You can have nearly both "pure Python" and "C ABI", I think. A Python-level
interface could be transformed by PyArrow into the C ABI and vice versa. Where
possible, the Python-level interface should let you 'extract' the underlying C
ABI, if it exists, but otherwise we can push the responsibility of the GIL and
such into PyArrow (or something like that). (So basically, shove all the
non-Python code into PyArrow.)
The question is what the filter/selection format should be; ideally it would
be language agnostic and implementation agnostic and so Dataset's expressions
aren't great there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]