>I think my question is still relevant: no matter what semantics `S3FileSystem` is trying to provide, I'm still not sure how the placeholder object helps. I assume it's for listing objects, but what else?
If I have a local filesystem and I delete a file /foo/bar then I still expect the directory /foo to exist. ``` mkdir /foo touch /foo/bar rm /foo/bar ls / # should show /foo ``` In an object store there is no `mkdir` and, even if I remove /foo/bar then there is no guarantee /foo will exist. On Fri, Jul 12, 2024, 2:50 PM Aldrin <octalene....@pm.me.invalid> wrote: > But I think the issue being addressed [1] is essentially, "`delete_file` > shouldn't create additional files/directories in S3." > > I think discussion about the semantics at large is interesting but may be > a digression? Also, I think there are varying degrees of "filesystem > semantics" that are even being discussed (the naming system and > hierarchical inode structure vs atomicity of read/write operations). > > I think my question is still relevant: no matter what semantics > `S3FileSystem` is trying to provide, I'm still not sure how the placeholder > object helps. I assume it's for listing objects, but what else? > > > [1]: https://github.com/apache/arrow/issues/36275 > > > # ------------------------------ > > # Aldrin > > > https://github.com/drin/ > > https://gitlab.com/octalene > > https://keybase.io/octalene > > > On Friday, July 12th, 2024 at 14:26, Raphael Taylor-Davies > <r.taylordav...@googlemail.com.INVALID> wrote: > > > > Many people > > > are familiar with object stores these days. You could create a new > > > abstraction `ObjectStore` which is very similar to `FileSystem` except > the > > > semantics are object store semantics and not filesystem semantics. > > > > > FWIW in the Arrow Rust ecosystem we only provide an object store > > abstraction, and this has served us very well. My 2 cents is that object > > store semantics are sufficient, if not superior [1], than filesystem > > based interfaces for the vast majority of use cases, with the few > > workloads that aren't sufficiently served requiring such close > > integration with often OS-specific filesystem APIs and behaviours as to > > make building a coherent abstraction extremely difficult. > > > > > Iceberg also took a similar approach with its File IO abstraction [2]. > > > > > [1]: > > > https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface > > [2]: https://tabular.io/blog/iceberg-fileio-cloud-native-tables/ > > > > > On 12/07/2024 22:05, Weston Pace wrote: > > > > > > > The markers are necessary to offer file system semantics on top of > object > > > > stores. You will get a ton of subtle bugs otherwise. > > > > Yes, object stores and filesystems are different. If you expect your > > > > filesystem to act like a filesystem then these things need to be > done in > > > > order to avoid these bugs. > > > > > > > If an option modifies a filesystem to behave more like an object store > then > > > I don't think it's necessarily a bad thing as long as it isn't the > > > default. By turning on the option the user is intentionally altering > the > > > behavior and should not be making the same expectations. > > > > > > > On the other hand, there is another approach you could take. Many > people > > > are familiar with object stores these days. You could create a new > > > abstraction `ObjectStore` which is very similar to `FileSystem` except > the > > > semantics are object store semantics and not filesystem semantics. I > > > believe most of our filesystem classes could implement both > `ObjectStore` > > > and `FileSystem` abstractions without significant code duplication. > > > > > > > This way, if a user wants filesystem semantics, they use a > `FileSystem` and > > > they pay the abstraction cost. If a user is comfortable with > `ObjectStore` > > > semantics they use `ObjectStore` and they don't have to pay the costs. > > > > > > > This would be more work than just allowing options to violate > FileSystem > > > guarantees but it would provide a more clear distinction between the > two. > > > > > > > On Fri, Jul 12, 2024 at 9:25 AM Aldrin octalene....@pm.me.invalid > wrote: > > > > > > > > Hello! > > > > > > > > > This may be naive, but why does the empty directory marker need to > exist > > > > on the S3 side at all? If a local directory is created (because > filesystem > > > > semantics), then I am not sure why a fake object needs to exist on > the > > > > object-store side. > > > > > > > > > # ------------------------------ > > > > > > > > > # Aldrin > > > > > > > > > https://github.com/drin/ > > > > > > > > > https://gitlab.com/octalene > > > > > > > > > https://keybase.io/octalene > > > > > > > > > On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho < > > > > felipe...@gmail.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > > The markers are necessary to offer file system semantics on top of > object > > > > > stores. You will get a ton of subtle bugs otherwise. > > > > > > > > > > > If instead of arrow::FileSystem, Arrow offered an > arrow::ObjectStore > > > > > interface that wraps local filesystems and object stores with > > > > > object-store > > > > > semantics (i.e. no concept of empty directory or atomic directory > > > > > deletion), then application developers would have more control of > the > > > > > actions performed on the object store they are using. Cons would be > > > > > slower > > > > > operations when working with a local filesystem and no concept of > > > > > directory. > > > > > > > > > > > > 1. Add an Option: Introduce an option in S3Options to control > > > > > > whether empty directory markers are created, giving users the > choice. > > > > > > > > > > > Then it wouldn't be an honest implementation of arrow::FileSystem > for the > > > > > reasons listed above. > > > > > > > > > > > > Change Default Behavior: Modify the default behavior to avoid > > > > > > creating empty directory markers when a file is deleted. > > > > > > > > > > > That would bring in the bugs because an arrow::FileSystem instance > would > > > > > behave differently depending on what is backing it. > > > > > > > > > > > > 3. Smarter Directory Creation: Improve the implementation to > check > > > > > > for other objects in the same path before creating an empty > directory > > > > > > marker. > > > > > > > > > > > This might be a problem when more than one client or thread is > mutating > > > > > the > > > > > object store through the arrow::FileSystem. You can check now and > once > > > > > you're done deleting all the other files you thought existed are > deleted > > > > > as > > > > > well. Very likely if clients decide to implement parallel deletion. > > > > > > > > > > > The existing solution of always creating a marker when done is not > > > > > perfect > > > > > either, but less likely to break. > > > > > > > > > > > ## Suggested Workaround > > > > > > > > > > > Avoiding file by file operations so that internal functions can > batch as > > > > > much as possible. > > > > > > > > > > > -- > > > > > Felipe > > > > > > > > > > > On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo hsseo0...@gmail.com > wrote: > > > > > > > > > > > > Hello. community! > > > > > > > > > > > > > I am currently working on addressing the issue described in [C++] > > > > > > Addoption to not create parent directory with S3 delete_file. In > this > > > > > > process, I have > > > > > > found it necessary to gather feedback on how to best resolve this > > > > > > issue. > > > > > > Below is a summary and some questions I have for the community. > > > > > > > > > > > > > ### Background > > > > > > Currently, the S3FileSystem generates an empty directory marker > (by > > > > > > calling the EnsureParentExists function) when a file is deleted > and the > > > > > > directory becomes empty. This behavior maintains the appearance > of the > > > > > > directory structure. However, there have been issues raised by > users > > > > > > regarding this behavior in issues 1. > > > > > > > > > > > > > ### Why Maintain Empty Directory Markers? > > > > > > From what I understand, object stores like S3 do not have a > concept of > > > > > > directories. The motivation behind maintaining these markers > could be > > > > > > to > > > > > > manage the object store as if it were a traditional file system. > If > > > > > > anyone > > > > > > knows the context behind the implementation of S3FileSystem, it > would > > > > > > be > > > > > > great if you could share it. > > > > > > > > > > > > > ### Issues with Marker Creation > > > > > > Users who have raised concerns about the creation of empty > directory > > > > > > markers cite the following reasons: > > > > > > > > > > > > > - Increase in Unnecessary Requests 2: Creating empty directory > > > > > > markers leads to additional S3 requests, which can increase > costs and > > > > > > affect performance. > > > > > > - File System Consistency Issues 1: S3 is designed as an object > > > > > > store, and creating empty directory markers can break the > inherent > > > > > > consistency of the file system. > > > > > > > > > > > > > ### Proposed Solutions > > > > > > Issue 1 suggests the following approaches: > > > > > > > > > > > > > 1. Add an Option: Introduce an option in S3Options to control > whether > > > > > > empty directory markers are created, giving users the choice. > > > > > > 2. Change Default Behavior: Modify the default behavior to avoid > > > > > > creating empty directory markers when a file is deleted. > > > > > > 3. Smarter Directory Creation: Improve the implementation to > check for > > > > > > other objects in the same path before creating an empty directory > > > > > > marker. > > > > > > Here is my personal thought (approach 1 + 3): > > > > > > > > > > > > > (approach 1) I believe it would be best to add the Marker as an > option > > > > > > (as some users might not want this enhancement). > > > > > > > > > > > > > (approach 3) When the option is enabled, if there are no files > > > > > > (objects) > > > > > > in the path (prefix) corresponding to a directory based on the > file > > > > > > system > > > > > > concept, we should maintain the Marker. Otherwise, we should > check the > > > > > > number of files in the same path and avoid calling > EnsureParentExists > > > > > > if > > > > > > there are two or more files. > > > > > > > > > > > > > On the other hand, I also feel that this approach might make the > logic > > > > > > more > > > > > > complicated. > > > > > > > > > > > > > ### We Would Like Your Feedback > > > > > > - What are your thoughts on the creation of empty directory > markers? > > > > > > - Which of the proposed solutions do you prefer? > > > > > > - Do you have any additional suggestions or comments? > > > > > > > > > > > > > We appreciate your valuable feedback and aim to find the best > solution > > > > > > based on your input. > > > > > > > > > > > > > Thank you.