[ https://issues.apache.org/jira/browse/ARROW-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson updated ARROW-6830: ----------------------------------- Summary: [R] Select Subset of Columns in read_arrow (was: Question / Feature Request- Select Subset of Columns in read_arrow) > [R] Select Subset of Columns in read_arrow > ------------------------------------------ > > Key: ARROW-6830 > URL: https://issues.apache.org/jira/browse/ARROW-6830 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, R > Reporter: Anthony Abate > Priority: Minor > > *Note:* Not sure if this is a limitation of the R library or the underlying > C++ code: > I have a ~30 gig arrow file with almost 1000 columns - it has 12,000 record > batches of varying row sizes > 1. Is it possible at to use *read_arrow* to filter out columns? (similar to > how *read_feather* has a (col_select =... ) > 2. Or is it possible using *RecordBatchFileReader* to filter columns? > > The only thing I seem to be able to do (please confirm if this is my only > option) is loop over all record batches, select a single column at a time, > and construct the data I need to pull out manually. ie like the following: > {code:java} > for(i in 0:data_rbfr$num_record_batches) { > rbn <- data_rbfr$get_batch(i) > > if (i == 0) > { > merged <- as.data.frame(rbn$column(5)$as_vector()) > } > else > { > dfn <- as.data.frame(rbn$column(5)$as_vector()) > merged <- rbind(merged,dfn) > } > > print(paste(i, nrow(merged))) > } {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)