[ 
https://issues.apache.org/jira/browse/ARROW-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516737#comment-17516737
 ] 

Nicola Crane commented on ARROW-15260:
--------------------------------------

This is now implemented in the C++ via ARROW-15281; however, it'll need some 
work to bring this functionality into the R code, and could be a bit tricky.  
Here are my notes on what I think the tricky bits will be.

In Python, we can do something like {{scanner = dataset_reader.scanner(dataset, 
columns=['__filename'])}}
In the R code, the Scanner object looks like the likely analog.  We use 
{{Scanner$create()}} to create a Scanner object, and it uses the {{projection}} 
field to specify columns.  

In the body of {{Scanner$create()}}, we have this code: {{proj <- 
c(dataset$selected_columns, dataset$temp_columns)}} and then later 
{{stopifnot("attempting to project with unknown columns" = all(projection %in% 
names(proj)))}}

So we'll need to make some sort of change so that we can select this "metadata" 
kind of column.  

It may be complicated further by the fact that this deviates a bit from the 
usual way of using {{dplyr::select()}}; i.e. if I set up a basic dataset based 
on the {{mtcars}} dataset and try to call {{select(mtcars_dataset, cyl, 
__filenames)}}, I get {{Error: unexpected input in "select(my_dataset, cyl, 
_"}} which is different from the usual {{Can't subset columns that don't 
exist.}} error message I might expect, so there may be something around the 
syntax here too.

> [R] open_dataset - add file_name as column
> ------------------------------------------
>
>                 Key: ARROW-15260
>                 URL: https://issues.apache.org/jira/browse/ARROW-15260
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Martin du Toit
>            Priority: Minor
>
> Hi. Is it possible to add the file_name as a column to a dataset?
> {code:r}
> ds <- open_dataset(.....)
> list_of_files <- ds$files
> {code}
> This works, but I need the file_name as a column.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to