Hello. Thank you for your feedback!!
> In which situation does this make a sizable difference in number of > requests? The issue I am addressing does not completely resolve the problem, but there is also the problem caused by *EnsureParentExists* as described in [2]. *The 42,129 requests with the type REST.PUT.OBJECT are due to the implementation of move() and delete(): it currently attempts to [re-create the parent directory](* https://github.com/apache/arrow/blob/b448b33808f2dd42866195fa4bb44198e2fc26b9/cpp/src/arrow/filesystem/s3fs.cc#L2849 )* after a copy or delete - this is because if there is only a single file in the prefix and we move/delete it, then the prefix will no longer exist. The workaround as implemented is to create a 0-sized object with the name of the prefix, ensuring that it still "exists".* ... *One major thing that worries me [is the EnsureparentExists()](* https://github.com/apache/arrow/blob/b448b33808f2dd42866195fa4bb44198e2fc26b9/cpp/src/arrow/filesystem/s3fs.cc#L2521 )* that is called from DeleteDir, DeleteFile and Move methods. In a versioned bucket, this will repeatedly create empty keys to mimic a directory.* > Which "inherent consistency" are we talking about concretely? I meant that users who use S3 as an object store do not want unnecessary files to be created due to 0-byte objects. *$ aws s3 ls s3://bucket/prefix/* *2023-06-23 11:51:02 0 prefix/01/* *2023-06-23 11:35:24 1438 prefix/01/file2.json.gz2023-06-23 10:47:18 819 prefix/01/file3.json.gz* > I don't know what this would achieve, and it would in itself issue > additional "unnecessary requests". You are right. I also think the same thing. This goes beyond the functionality of a library. If needed, users should be able to combine functions available through S3FileSystem to optimize their use. Thank you once again for your feedback. I will proceed with improving this by adding an option to *S3Options*. Regards Hyunseok Seo. 2024년 7월 12일 (금) 오후 7:52, Antoine Pitrou <anto...@python.org>님이 작성: > > Hi, > > Le 12/07/2024 à 12:21, Hyunseok Seo a écrit : > > > > *### Why Maintain Empty Directory Markers?* > > From what I understand, object stores like S3 do not have a concept of > > directories. The motivation behind maintaining these markers could be to > > manage the object store as if it were a traditional file system. > > Also, to maintain compatibility with other filesystem-like abstractions > over S3. > > > *### Issues with Marker Creation* > > Users who have raised concerns about the creation of empty directory > > markers cite the following reasons: > > > > - **Increase in Unnecessary Requests [2]**: Creating empty directory > > markers leads to additional S3 requests, which can increase costs and > > affect performance. > > In which situation does this make a sizable difference in number of > requests? > > > - **File System Consistency Issues [1]**: S3 is designed as an object > > store, and creating empty directory markers can break the inherent > > consistency of the file system. > > Which "inherent consistency" are we talking about concretely? > > > *### Proposed Solutions* > > Issue [1] suggests the following approaches: > > > > 1. **Add an Option**: Introduce an option in *S3Options* to control > whether > > empty directory markers are created, giving users the choice. > > That sounds ok to me. > > > 3. **Smarter Directory Creation**: Improve the implementation to check > for > > other objects in the same path before creating an empty directory marker. > > I don't know what this would achieve, and it would in itself issue > additional "unnecessary requests". > > Regards > > Antoine. >