But I think the issue being addressed [1] is essentially, "`delete_file` 
shouldn't create additional files/directories in S3."

I think discussion about the semantics at large is interesting but may be a 
digression? Also, I think there are varying degrees of "filesystem semantics" 
that are even being discussed (the naming system and hierarchical inode 
structure vs atomicity of read/write operations).

I think my question is still relevant: no matter what semantics `S3FileSystem` 
is trying to provide, I'm still not sure how the placeholder object helps. I 
assume it's for listing objects, but what else?


[1]: https://github.com/apache/arrow/issues/36275


# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Friday, July 12th, 2024 at 14:26, Raphael Taylor-Davies 
<r.taylordav...@googlemail.com.INVALID> wrote:

> > Many people
> > are familiar with object stores these days. You could create a new
> > abstraction `ObjectStore` which is very similar to `FileSystem` except the
> > semantics are object store semantics and not filesystem semantics.
> 

> FWIW in the Arrow Rust ecosystem we only provide an object store
> abstraction, and this has served us very well. My 2 cents is that object
> store semantics are sufficient, if not superior [1], than filesystem
> based interfaces for the vast majority of use cases, with the few
> workloads that aren't sufficiently served requiring such close
> integration with often OS-specific filesystem APIs and behaviours as to
> make building a coherent abstraction extremely difficult.
> 

> Iceberg also took a similar approach with its File IO abstraction [2].
> 

> [1]:
> https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface
> [2]: https://tabular.io/blog/iceberg-fileio-cloud-native-tables/
> 

> On 12/07/2024 22:05, Weston Pace wrote:
> 

> > > The markers are necessary to offer file system semantics on top of object
> > > stores. You will get a ton of subtle bugs otherwise.
> > > Yes, object stores and filesystems are different. If you expect your
> > > filesystem to act like a filesystem then these things need to be done in
> > > order to avoid these bugs.
> > 

> > If an option modifies a filesystem to behave more like an object store then
> > I don't think it's necessarily a bad thing as long as it isn't the
> > default. By turning on the option the user is intentionally altering the
> > behavior and should not be making the same expectations.
> > 

> > On the other hand, there is another approach you could take. Many people
> > are familiar with object stores these days. You could create a new
> > abstraction `ObjectStore` which is very similar to `FileSystem` except the
> > semantics are object store semantics and not filesystem semantics. I
> > believe most of our filesystem classes could implement both `ObjectStore`
> > and `FileSystem` abstractions without significant code duplication.
> > 

> > This way, if a user wants filesystem semantics, they use a `FileSystem` and
> > they pay the abstraction cost. If a user is comfortable with `ObjectStore`
> > semantics they use `ObjectStore` and they don't have to pay the costs.
> > 

> > This would be more work than just allowing options to violate FileSystem
> > guarantees but it would provide a more clear distinction between the two.
> > 

> > On Fri, Jul 12, 2024 at 9:25 AM Aldrin octalene....@pm.me.invalid wrote:
> > 

> > > Hello!
> > > 

> > > This may be naive, but why does the empty directory marker need to exist
> > > on the S3 side at all? If a local directory is created (because filesystem
> > > semantics), then I am not sure why a fake object needs to exist on the
> > > object-store side.
> > > 

> > > # ------------------------------
> > > 

> > > # Aldrin
> > > 

> > > https://github.com/drin/
> > > 

> > > https://gitlab.com/octalene
> > > 

> > > https://keybase.io/octalene
> > > 

> > > On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho <
> > > felipe...@gmail.com> wrote:
> > > 

> > > > Hi,
> > > > 

> > > > The markers are necessary to offer file system semantics on top of 
> > > > object
> > > > stores. You will get a ton of subtle bugs otherwise.
> > > > 

> > > > If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore
> > > > interface that wraps local filesystems and object stores with
> > > > object-store
> > > > semantics (i.e. no concept of empty directory or atomic directory
> > > > deletion), then application developers would have more control of the
> > > > actions performed on the object store they are using. Cons would be
> > > > slower
> > > > operations when working with a local filesystem and no concept of
> > > > directory.
> > > > 

> > > > > 1. Add an Option: Introduce an option in S3Options to control
> > > > > whether empty directory markers are created, giving users the choice.
> > > > 

> > > > Then it wouldn't be an honest implementation of arrow::FileSystem for 
> > > > the
> > > > reasons listed above.
> > > > 

> > > > > Change Default Behavior: Modify the default behavior to avoid
> > > > > creating empty directory markers when a file is deleted.
> > > > 

> > > > That would bring in the bugs because an arrow::FileSystem instance would
> > > > behave differently depending on what is backing it.
> > > > 

> > > > > 3. Smarter Directory Creation: Improve the implementation to check
> > > > > for other objects in the same path before creating an empty directory
> > > > > marker.
> > > > 

> > > > This might be a problem when more than one client or thread is mutating
> > > > the
> > > > object store through the arrow::FileSystem. You can check now and once
> > > > you're done deleting all the other files you thought existed are deleted
> > > > as
> > > > well. Very likely if clients decide to implement parallel deletion.
> > > > 

> > > > The existing solution of always creating a marker when done is not
> > > > perfect
> > > > either, but less likely to break.
> > > > 

> > > > ## Suggested Workaround
> > > > 

> > > > Avoiding file by file operations so that internal functions can batch as
> > > > much as possible.
> > > > 

> > > > --
> > > > Felipe
> > > > 

> > > > On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo hsseo0...@gmail.com wrote:
> > > > 

> > > > > Hello. community!
> > > > > 

> > > > > I am currently working on addressing the issue described in [C++]
> > > > > Addoption to not create parent directory with S3 delete_file. In this
> > > > > process, I have
> > > > > found it necessary to gather feedback on how to best resolve this
> > > > > issue.
> > > > > Below is a summary and some questions I have for the community.
> > > > > 

> > > > > ### Background
> > > > > Currently, the S3FileSystem generates an empty directory marker (by
> > > > > calling the EnsureParentExists function) when a file is deleted and 
> > > > > the
> > > > > directory becomes empty. This behavior maintains the appearance of the
> > > > > directory structure. However, there have been issues raised by users
> > > > > regarding this behavior in issues 1.
> > > > > 

> > > > > ### Why Maintain Empty Directory Markers?
> > > > > From what I understand, object stores like S3 do not have a concept of
> > > > > directories. The motivation behind maintaining these markers could be
> > > > > to
> > > > > manage the object store as if it were a traditional file system. If
> > > > > anyone
> > > > > knows the context behind the implementation of S3FileSystem, it would
> > > > > be
> > > > > great if you could share it.
> > > > > 

> > > > > ### Issues with Marker Creation
> > > > > Users who have raised concerns about the creation of empty directory
> > > > > markers cite the following reasons:
> > > > > 

> > > > > - Increase in Unnecessary Requests 2: Creating empty directory
> > > > > markers leads to additional S3 requests, which can increase costs and
> > > > > affect performance.
> > > > > - File System Consistency Issues 1: S3 is designed as an object
> > > > > store, and creating empty directory markers can break the inherent
> > > > > consistency of the file system.
> > > > > 

> > > > > ### Proposed Solutions
> > > > > Issue 1 suggests the following approaches:
> > > > > 

> > > > > 1. Add an Option: Introduce an option in S3Options to control whether
> > > > > empty directory markers are created, giving users the choice.
> > > > > 2. Change Default Behavior: Modify the default behavior to avoid
> > > > > creating empty directory markers when a file is deleted.
> > > > > 3. Smarter Directory Creation: Improve the implementation to check for
> > > > > other objects in the same path before creating an empty directory
> > > > > marker.
> > > > > Here is my personal thought (approach 1 + 3):
> > > > > 

> > > > > (approach 1) I believe it would be best to add the Marker as an option
> > > > > (as some users might not want this enhancement).
> > > > > 

> > > > > (approach 3) When the option is enabled, if there are no files
> > > > > (objects)
> > > > > in the path (prefix) corresponding to a directory based on the file
> > > > > system
> > > > > concept, we should maintain the Marker. Otherwise, we should check the
> > > > > number of files in the same path and avoid calling EnsureParentExists
> > > > > if
> > > > > there are two or more files.
> > > > > 

> > > > > On the other hand, I also feel that this approach might make the logic
> > > > > more
> > > > > complicated.
> > > > > 

> > > > > ### We Would Like Your Feedback
> > > > > - What are your thoughts on the creation of empty directory markers?
> > > > > - Which of the proposed solutions do you prefer?
> > > > > - Do you have any additional suggestions or comments?
> > > > > 

> > > > > We appreciate your valuable feedback and aim to find the best solution
> > > > > based on your input.
> > > > > 

> > > > > Thank you.

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to