Tom-Newton commented on issue #18014: URL: https://github.com/apache/arrow/issues/18014#issuecomment-1700666053
I think we're ready to start implementing the filesystem itself. Looking at how [GCS was done](https://github.com/apache/arrow/commits/main?before=9b6be29f431705ce1f85cc218c66d4d03698f06b+35&branch=main&path%5B%5D=cpp&path%5B%5D=src&path%5B%5D=arrow&path%5B%5D=filesystem&path%5B%5D=gcsfs.cc&qualified_name=refs%2Fheads%2Fmain) the next part was an implementation of `arrow::io::InputStream` and `OpenInputStream`. However it looks like `arrow::io::RandomAccessFile` is a superset of `arrow::io::InputStream` so I'm think it makes sense to just implement `arrow::io::RandomAccessFile`. This is what https://github.com/apache/arrow/pull/12914 did. I would propose we move forward with implementing `arrow::io::RandomAccessFile`, `OpenInputStream` and `OpenInputFile` as in https://github.com/apache/arrow/pull/12914. One change I would suggest is to avoid depending on whether the storage account has hierarchical namespace enabled. Hierarchical namespace is important for listing and renames if you want to make them faster but for blob reads I don't think it should matter, and it adds complexity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
