Hello. community!

I am currently working on addressing the issue described in [[C++] Add
option to not create parent directory with S3 delete_file](
https://github.com/apache/arrow/issues/36275). In this process, I have
found it necessary to gather feedback on how to best resolve this issue.
Below is a summary and some questions I have for the community.


*### Background*
Currently, the *S3FileSystem* generates an empty directory marker (by
calling the *EnsureParentExists* function) when a file is deleted and the
directory becomes empty. This behavior maintains the appearance of the
directory structure. However, there have been issues raised by users
regarding this behavior in issues [1][2].


*### Why Maintain Empty Directory Markers?*
>From what I understand, object stores like S3 do not have a concept of
directories. The motivation behind maintaining these markers could be to
manage the object store as if it were a traditional file system. If anyone
knows the context behind the implementation of S3FileSystem, it would be
great if you could share it.


*### Issues with Marker Creation*
Users who have raised concerns about the creation of empty directory
markers cite the following reasons:

- **Increase in Unnecessary Requests [2]**: Creating empty directory
markers leads to additional S3 requests, which can increase costs and
affect performance.
- **File System Consistency Issues [1]**: S3 is designed as an object
store, and creating empty directory markers can break the inherent
consistency of the file system.


*### Proposed Solutions*
Issue [1] suggests the following approaches:

1. **Add an Option**: Introduce an option in *S3Options* to control whether
empty directory markers are created, giving users the choice.
2. **Change Default Behavior**: Modify the default behavior to avoid
creating empty directory markers when a file is deleted.
3. **Smarter Directory Creation**: Improve the implementation to check for
other objects in the same path before creating an empty directory marker.

*Here is my personal thought (*approach* 1 + 3)*:

(*approach 1*) I believe it would be best to add the Marker as an option
(as some users might not want this enhancement).

(*approach 3*) When the option is enabled, if there are no files (objects)
in the path (prefix) corresponding to a directory based on the file system
concept, we should maintain the Marker. Otherwise, we should check the
number of files in the same path and avoid calling EnsureParentExists if
there are two or more files.

On the other hand, I also feel that this approach might make the logic more
complicated.


*### We Would Like Your Feedback*
- What are your thoughts on the creation of empty directory markers?
- Which of the proposed solutions do you prefer?
- Do you have any additional suggestions or comments?

We appreciate your valuable feedback and aim to find the best solution
based on your input.

Thank you.

[1]: https://github.com/apache/arrow/issues/36275
[2]: https://github.com/apache/arrow/issues/40589

Reply via email to