[
https://issues.apache.org/jira/browse/ARROW-13685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411464#comment-17411464
]
Weston Pace commented on ARROW-13685:
-------------------------------------
Hmm, datasets are filesystem-agnostic so I'd prefer to avoid `create_bucket` as
a dataset option. It could be an option added to the S3 filesystem
(create_bucket_if_not_exists or something) but I think my preferred fix for
this issue would be to modify S3FileSystem::CreateBucket so that it first does
a check to see if the bucket exists. I suspect the current implementation is
the way that it is just to reduce complexity (why make more calls than needed)
but it isn't working here so the more complex solution is warranted. I'll go
ahead and put a PR up for this soon. The trickiest part will probably be the
regression test more than the implementation.
> [Python] Cannot write dataset to S3FileSystem if bucket already exists
> ----------------------------------------------------------------------
>
> Key: ARROW-13685
> URL: https://issues.apache.org/jira/browse/ARROW-13685
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 5.0.0
> Reporter: Caleb Overman
> Priority: Major
>
> I'm trying to write a parquet file to an existing S3 bucket using the new
> S3FileSystem interface. However, this is failing with an AWS Access Denied
> error (I do have necessary access). It appears to be trying to recreate the
> bucket which already exists.
> {code:java}
> import numpy as np
> import pyarrow as pa
> from pyarrow import fs
> import pyarrow.dataset as ds
> s3 = fs.S3FileSystem(region="us-west-2")
> table = pa.table({"a": range(10), "b": np.random.randn(10), "c": [1, 2] * 5})
> ds.write_dataset(
> table,
> "my-bucket/test.parquet",
> format="parquet",
> filesystem=s3,
> ){code}
> {code:java}
> OSError: When creating bucket 'my-bucket': AWS Error [code 15]: Access Denied
> {code}
> I'm seeing the same behavior using {{S3FileSystem.create_dir}} when
> {{recursive=True}}.
> {code:java}
> s3.create_dir("my-bucket/test_dir/", recursive=True) # Fails
> s3.create_dir("my-bucket/test_dir/", recursive=False) # Succeeds
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)