[
https://issues.apache.org/jira/browse/ARROW-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516737#comment-17516737
]
Nicola Crane commented on ARROW-15260:
--------------------------------------
This is now implemented in the C++ via ARROW-15281; however, it'll need some
work to bring this functionality into the R code, and could be a bit tricky.
Here are my notes on what I think the tricky bits will be.
In Python, we can do something like {{scanner = dataset_reader.scanner(dataset,
columns=['__filename'])}}
In the R code, the Scanner object looks like the likely analog. We use
{{Scanner$create()}} to create a Scanner object, and it uses the {{projection}}
field to specify columns.
In the body of {{Scanner$create()}}, we have this code: {{proj <-
c(dataset$selected_columns, dataset$temp_columns)}} and then later
{{stopifnot("attempting to project with unknown columns" = all(projection %in%
names(proj)))}}
So we'll need to make some sort of change so that we can select this "metadata"
kind of column.
It may be complicated further by the fact that this deviates a bit from the
usual way of using {{dplyr::select()}}; i.e. if I set up a basic dataset based
on the {{mtcars}} dataset and try to call {{select(mtcars_dataset, cyl,
__filenames)}}, I get {{Error: unexpected input in "select(my_dataset, cyl,
_"}} which is different from the usual {{Can't subset columns that don't
exist.}} error message I might expect, so there may be something around the
syntax here too.
> [R] open_dataset - add file_name as column
> ------------------------------------------
>
> Key: ARROW-15260
> URL: https://issues.apache.org/jira/browse/ARROW-15260
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Martin du Toit
> Priority: Minor
>
> Hi. Is it possible to add the file_name as a column to a dataset?
> {code:r}
> ds <- open_dataset(.....)
> list_of_files <- ds$files
> {code}
> This works, but I need the file_name as a column.
> Thanks
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)