[ 
https://issues.apache.org/jira/browse/ARROW-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209762#comment-17209762
 ] 

Neal Richardson commented on ARROW-10219:
-----------------------------------------

I didn't know about include_columns, thanks.

Here's two use cases for being able to get the column names without reading the 
whole table:

* R's various CSV readers all let you specify column types as an unnamed vector 
of types; column names can also be specified but via a different argument. But 
the arrow csv reader currently can't do this: you can't specify column types 
while allowing the column names to be read from the file. So in this case, I'd 
like to be able to instantiate a TableReader with the other given options, 
query to get the column names, and then use those to create the fully specified 
TableReader to call Read on.
* Some of R's CSV readers let you specify columns to keep in (or exclude from) 
the resulting data frame either by integer indices or by some expression (e.g. 
{{starts_with("something")}}). In order to pass those to 
{{ConvertOptions::include_columns}}, I need to get the column names from the 
CSV so that I can translate those.

> [C++] csv::TableReader column names, Read() arguments
> -----------------------------------------------------
>
>                 Key: ARROW-10219
>                 URL: https://issues.apache.org/jira/browse/ARROW-10219
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Neal Richardson
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Some feature requests:
> * csv::TableReader {{column_names}} method, and/or {{schema}} method. This 
> will (in most cases) require IO to get these from the file, but that's fine. 
> There are use cases (we've seen in R) where it would help to be able to get 
> the names from the file (e.g. when you specify column types, it's a map of 
> column name to type, so you can't currently specify types without also 
> specifying names)
> * Add Read(std::vector<int>) like how feather (and parquet?) have so that you 
> don't have to parse and allocate columns you don't want.
> cc [~apitrou] [~romainfrancois]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to