[jira] [Comment Edited] (ARROW-15892) [C++] Dataset APIs require s3:ListBucket Permissions

Jonny Fuller (Jira) Wed, 09 Mar 2022 10:11:05 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503766#comment-17503766
 ]


Jonny Fuller edited comment on ARROW-15892 at 3/9/22, 6:10 PM:
---------------------------------------------------------------

My understanding is that the API tries to create a bucket to see if it exists 
as part of its standard workflow. The only permissions I have are PutObject 
within a prefix boundary (ie put object at /myapp/namespace/<etc>). I'm looking 
for a way to bypass the bucket checking step and simply write the data as if I 
was using the {{write_table}} API. 

I should add, I cannot provide the detailed permissions because they are 
provided by a temporary STS session. To my knowledge there is no way to ask STS 
"what permissions does this assumed role have?" ie you have to actually make 
requests to see if they are denied. I know I can {{PutObject}} because 
{{write_table}} ie writing a single parquet file works, as do other single file 
operations. Only when trying to use the higher level dataset (aka multiple file 
partition APIs) does my operation fail due to access control issues as detailed 
above. 


was (Author: JIRAUSER286359):
My understanding is that the API tries to create a bucket to see if it exists 
as part of its standard workflow. The only permissions I have are PutObject 
within a prefix boundary (ie put object at /myapp/namespace/<etc>). I'm looking 
for a way to bypass the bucket checking step and simply write the data as if I 
was using the {{write_table}} API. 

> [C++] Dataset APIs require s3:ListBucket Permissions
> ----------------------------------------------------
>
>                 Key: ARROW-15892
>                 URL: https://issues.apache.org/jira/browse/ARROW-15892
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Jonny Fuller
>            Priority: Minor
>
> Hi team, first time posting an issue so I apologize if the format is lacking. 
> My original comment is on ARROW-13685 Github Issue 
> [here|https://github.com/apache/arrow/pull/11136#issuecomment-1062406820]. 
> Long story short, our environment is super locked down, and while my 
> application has permission to write data against an s3 prefix, I do not have 
> the {{ListBucket}} permission nor can I add it. This does not prevent me from 
> using the "individual" file APIs like {{pq.write_table}} but the bucket 
> validation logic in the "dataset" APIs breaks when trying to test for the 
> bucket's existence. 
> {code:java}
> pq.write_to_dataset(pa.Table.from_batches([data]), location, 
> filesystem=s3fs){code}
> {code:java}
> OSError: When creating bucket '<my bucket>': AWS Error [code 15]: Access 
> Denied{code}
> The same is true for the generic {{pyarrow.dataset}} APIs. My understanding 
> is the bucket validation logic is part of the C++ code, not the Python API. 
> As a Pythonista who knows nothing of C++ I am not sure how to resolve this 
> problem.
>  
> Would it be possible to disable the bucket existence check with an optional 
> key word argument? Thank you for your time!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-15892) [C++] Dataset APIs require s3:ListBucket Permissions

Reply via email to