[I] Dataset-like interface for "columnar" partitioning [arrow]

via GitHub Fri, 02 Feb 2024 08:43:49 -0800


buhrmann opened a new issue, #39915:
URL: https://github.com/apache/arrow/issues/39915


   ### Describe the enhancement requested
   
   Hi, I was wondering whether there is a trick, or plans for a future feature, 
to have a Dataset interface to multiple ipc/parquet files on disk, where the 
files contain not different rows, but different columns? I know I can combine 
them cheaply in memory, but the idea would be for this interface to work 
everywhere that existing tools expect an arrow dataset, and to be able to 
access the data in streaming mode, e.g. via ibis, polars, duckdb etc.
   
   Or perhaps there is a trick to cheaply concatenate various datasets? I can 
see the SQL-like `dataset.join()`, but not a simple horizontal concatenation 
("join on row numbers").
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Dataset-like interface for "columnar" partitioning [arrow]

Reply via email to