westonpace commented on issue #10634: URL: https://github.com/apache/arrow/issues/10634#issuecomment-872502967
Ok, yes, I looked a little further. I did not realize that the dataset code calls CreateDir even if the bucket already exists (it uses CreateDir to test if the bucket exists). So this is ARROW-13228. If you are able to wait for version 5.0.0 (~a month out) then you can get a fix there. Alternatively you can use the latest nightly build or build from source. Another workaround may be to use s3fs, PyFilesystem, and FSSpecHandler: ``` import s3fs import pyarrow.fs s3fs_instance = s3fs.S3FileSystem() filesystem = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(s3fs_instance)) ``` A final workaround could be to create your own filesystem implementation that wraps a pyarrow.fs.S3FileSystem instance (e.g. proxy pattern) and for `create_dir` it simply returns True. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
