[I] Add option to record file path in `open_dataset` and `open_csv_dataset` [arrow]

via GitHub Thu, 05 Oct 2023 03:03:16 -0700


orgadish opened a new issue, #38036:
URL: https://github.com/apache/arrow/issues/38036


   ### Describe the enhancement requested
   
   I often have to work with data where there is information stored in the file 
path (e.g. in the directory containing this file, or in the file name).
   
   When I use `readr::read_csv`, there is an `id` argument:
   
   id | The name of a column in which to store the file path. This is useful 
when reading multiple input files and there is data in the file paths, such as 
the data collection date. If NULL (the default) no extra column is created.
   -- | --
   
   As far as I can tell, the only way to recreate this with `open_csv_dataset` 
currently, is to read each file, resave it with the file path as an existing 
column and then use `open_csv_dataset`. (I know the list of files is stored in 
the Dataset object, but I don't know if the data associated with each file is 
stored).
   
   It would be great if there was an `id` equivalent for the `open_dataset` 
functions.
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add option to record file path in `open_dataset` and `open_csv_dataset` [arrow]

Reply via email to