martin-traverse opened a new issue, #38618:
URL: https://github.com/apache/arrow/issues/38618

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I think I have found a regression for S3FileSystem in PyArrow 14. Using the 
delete_dir() method with PyArrow 13 will remove a "directory" object and all 
its children, i.e. it will delete everything with the directory prefix. However 
when I switch to PyArrow 14 this no longer works, after calling delete_dir() 
the object still exists (I am checking with get_file_info() != 
FileType.NotFound). The issue also seems to affect delete_dir_contents(). The 
delete_dir() method still works on empty directories, and if I remove all the 
objects inside a directory and then call delete_dir() it will also work.
   
   Could this be something to do with the AWS call for removing objects by 
prefix? In the native APIs for GCP / Azure this is not possible and objects 
need to be listed and deleted in batches, but for S3 delete by prefix is 
available so I'm guessing the S3 FileSystem implementation uses this feature? 
These low level details are hidden by the Arrow FS abstraction though, so while 
delete_dir() is not working there is no effective way to work around without 
writing an implementation in the native AWS SDK - I'd prefer not to do that for 
the sake of one call!
   
   I have seen this issue on macOS with Python 3.12 and 3.10, also on Ubuntu 
with Python 3.11. In all cases the Arrow 13 version worked and the Arrow 14 
version didn't. I haven't tried Arrow 14 on Windows but I do know Arrow 13 used 
to work on Windows. Also I'm assuming this is in the C implementation so it 
could affect more than just the Python component, but I have only tested with 
Python.
   
   Please let me know if there's anything I can do to help diagnose this issue. 
If it is a regression and it's possible to fix with a 14.0.1 that would be 
amazing but I don't know how possible that is.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to