wjones127 commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119796664
> I'm not sure what you mean by this Sorry that wasn't clear. I pointed out two implementations of an abstraction over object stores (S3, GCS, etc.) that are like filesystems (in that they have a notion of directories, not that they make any guarantees about atomicity). These are used by analytics systems like Dask and PyArrow, so there's some evidence we can build useful query engines on top of such an abstraction. Thanks @alamb for the IOx example. > Trying to make object storage behave exactly like a filesystem is impossible (e.g. S3 doesn't support CreateIfNotExists), however, my thesis is that no query engine actually wants filesystem semantics, I largely agree. I think the main thing these "FileSystem" abstractions provide is a notion of "directory", which is important in directory-partitioned datasets. The existing API can handle that fine with delimiter, but it does seem a little funny you can provide whatever delimiter you want. > Could you expand on what you mean by this, do you mean being able to read data written by another system which should be trivial, or are you talking about some sort of API-level integration like FFI? Yeah I think as long as you *could* do the expected filesystem operations on top of the API, then that seems fine. For context, I plan to wrap the `ObjectStore` API in a PyArrow-compatible filesystem for use in delta-rs. Hence #2246. But I think I'll scale back my changes in #2246 and remove the `create_dir()`, `remove_dir()` methods if we want to just think of this as an object store abstraction with no awareness of directories. > Also, @carols10cents spent considerable time sorting out consistent directory semantics for object stores and local files in https://github.com/influxdata/influxdb_iox/blob/main/object_store -- maybe we can just use those semantics (or maybe even the code?) That sounds very promising @lamb. Thanks for pointing out! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
