Re: [I] [R] Add option to ignore partition file names when querying an arrow open_dataset() ? [arrow]

via GitHub Sat, 21 Dec 2024 10:30:03 -0800


thisisnic commented on issue #44889:
URL: https://github.com/apache/arrow/issues/44889#issuecomment-2558192102


   Hi @JakeRuss , thanks for opening the issue!  When a dataset is created via 
`open_dataset()`, one of the things which happens first is that it scans all 
possible files and keeps a list of these within the resulting dataset object.  
To make the change you suggest, we'd have to fundamentally change how datasets 
work, so it's unlikely to be feasible as a change, sorry!
   
   Happy to help think of possible workarounds though.  The first things that 
come to mind - and you may have already thought of these - would be modifying 
the upstream pipeline to not have random filenames if possible, or, creating an 
initial dataset based on the files you know to be fixed and then creating a 
smaller dataset with the file which has just been renamed and then passing both 
of these datasets into `open_dataset()` to make a single dataset. Might either 
of those options work for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [R] Add option to ignore partition file names when querying an arrow open_dataset() ? [arrow]

Reply via email to