[I] Enable `col_select` or similar in `open_csv_dataset` to read files with a shared subset of columns [arrow]

via GitHub Thu, 05 Oct 2023 01:30:57 -0700


orgadish opened a new issue, #38031:
URL: https://github.com/apache/arrow/issues/38031


   ### Describe the enhancement requested
   
   Per the documentation, `col_select` is currently not supported in 
`arrow::open_csv_dataset` and it is recommended to "instead, subset columns 
after dataset creation". 
   
   This approach doesn't work, however, when the files don't share the total 
schema. Often, though, I may only care about a subset of columns which I know 
are shared by all the files, even if random other columns have been added in. 
It would be great if there was a way to specify that columns outside the schema 
should be ignored, or to enable `col_select`.
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Enable `col_select` or similar in `open_csv_dataset` to read files with a shared subset of columns [arrow]

Reply via email to