buhrmann opened a new issue, #39915:
URL: https://github.com/apache/arrow/issues/39915
### Describe the enhancement requested
Hi, I was wondering whether there is a trick, or plans for a future feature,
to have a Dataset interface to multiple ipc/parquet files on disk, where the
files contain not different rows, but different columns? I know I can combine
them cheaply in memory, but the idea would be for this interface to work
everywhere that existing tools expect an arrow dataset, and to be able to
access the data in streaming mode, e.g. via ibis, polars, duckdb etc.
Or perhaps there is a trick to cheaply concatenate various datasets? I can
see the SQL-like `dataset.join()`, but not a simple horizontal concatenation
("join on row numbers").
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]