[GitHub] [arrow] jonkeane commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

via GitHub Thu, 21 Sep 2023 09:49:14 -0700


jonkeane commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1729950592


   Thanks for that output — and sorry I skipped over the bit about pyarrow 
going quicker in your first message.
   
   I presume those are the fuse outputs when you run 
   
   `open_dataset("nyc-taxi", unify_schemas=FALSE)` and then 
`ds.dataset("nyc-taxi", partitioning="hive")` respectively, yeah? And are you 
running them in that order? If they are ordered, do you see the same behavior 
if you run the R version a second time?
   
   > Maybe ls -l --time=atime or strace? Maybe there is a way to create a mock 
filesystem in order to verify what operations R arrow is performing? Any advice 
is appreciated.
   
   IIRC, both Python and R are using the exact same C++-based filesystem 
machinery under the hood. There might be small misalignments of options being 
passed (which we should investigate), but ultimately both are using [the same 
C++ filesystem interface](https://arrow.apache.org/docs/cpp/io.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Reply via email to