martin-traverse opened a new issue, #38618: URL: https://github.com/apache/arrow/issues/38618
### Describe the bug, including details regarding any error messages, version, and platform. I think I have found a regression for S3FileSystem in PyArrow 14. Using the delete_dir() method with PyArrow 13 will remove a "directory" object and all its children, i.e. it will delete everything with the directory prefix. However when I switch to PyArrow 14 this no longer works, after calling delete_dir() the object still exists (I am checking with get_file_info() != FileType.NotFound). The issue also seems to affect delete_dir_contents(). The delete_dir() method still works on empty directories, and if I remove all the objects inside a directory and then call delete_dir() it will also work. Could this be something to do with the AWS call for removing objects by prefix? In the native APIs for GCP / Azure this is not possible and objects need to be listed and deleted in batches, but for S3 delete by prefix is available so I'm guessing the S3 FileSystem implementation uses this feature? These low level details are hidden by the Arrow FS abstraction though, so while delete_dir() is not working there is no effective way to work around without writing an implementation in the native AWS SDK - I'd prefer not to do that for the sake of one call! I have seen this issue on macOS with Python 3.12 and 3.10, also on Ubuntu with Python 3.11. In all cases the Arrow 13 version worked and the Arrow 14 version didn't. I haven't tried Arrow 14 on Windows but I do know Arrow 13 used to work on Windows. Also I'm assuming this is in the C implementation so it could affect more than just the Python component, but I have only tested with Python. Please let me know if there's anything I can do to help diagnose this issue. If it is a regression and it's possible to fix with a 14.0.1 that would be amazing but I don't know how possible that is. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
