wjones127 commented on issue #2445:
URL: 
https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119796664

   > I'm not sure what you mean by this
   
   Sorry that wasn't clear. I pointed out two implementations of an abstraction 
over object stores (S3, GCS, etc.) that are like filesystems (in that they have 
a notion of directories, not that they make any guarantees about atomicity). 
These are used by analytics systems like Dask and PyArrow, so there's some 
evidence we can build useful query engines on top of such an abstraction.
   
   Thanks @alamb for the IOx example.
   
   > Trying to make object storage behave exactly like a filesystem is 
impossible (e.g. S3 doesn't support CreateIfNotExists), however, my thesis is 
that no query engine actually wants filesystem semantics,
   
   I largely agree. I think the main thing these "FileSystem" abstractions 
provide is a notion of "directory", which is important in directory-partitioned 
datasets. The existing API can handle that fine with delimiter, but it does 
seem a little funny you can provide whatever delimiter you want.
   
   > Could you expand on what you mean by this, do you mean being able to read 
data written by another system which should be trivial, or are you talking 
about some sort of API-level integration like FFI?
   
   Yeah I think as long as you *could* do the expected filesystem operations on 
top of the API, then that seems fine. For context, I plan to wrap the 
`ObjectStore` API in a PyArrow-compatible filesystem for use in delta-rs. Hence 
#2246.
   
   But I think I'll scale back my changes in #2246 and remove the 
`create_dir()`, `remove_dir()` methods if we want to just think of this as an 
object store abstraction with no awareness of directories.
   
   > Also, @carols10cents spent considerable time sorting out consistent 
directory semantics for object stores and local files in 
https://github.com/influxdata/influxdb_iox/blob/main/object_store -- maybe we 
can just use those semantics (or maybe even the code?)
   
   That sounds very promising @lamb. Thanks for pointing out!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to