[jira] [Commented] (ARROW-5977) [C++] [Python] Method for read_csv to limit which columns are read?

Neal Richardson (JIRA) Tue, 06 Aug 2019 07:45:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901124#comment-16901124
 ]


Neal Richardson commented on ARROW-5977:
----------------------------------------

All of R's main CSV readers support this. One way they all expose this by 
allowing you to provide a null type for some columns when you specify their 
types explicitly. A couple of the readers allow you to specify columns by name 
or position to keep or drop. 

I think this is a good idea not just in the context of reading a CSV itself but 
also for the Datasets framework, where we are lazily reading chunks of data as 
needed and trying to be efficient with memory usage. 

> [C++] [Python] Method for read_csv to limit which columns are read?
> -------------------------------------------------------------------
>
>                 Key: ARROW-5977
>                 URL: https://issues.apache.org/jira/browse/ARROW-5977
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.14.0
>            Reporter: Jordan Samuels
>            Priority: Major
>              Labels: csv
>
> In pandas there is pd.read_csv(usecols=...) but I can't see a way to do this 
> in pyarrow. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5977) [C++] [Python] Method for read_csv to limit which columns are read?

Reply via email to