[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

Steve Loughran (Jira) Fri, 10 Jun 2022 07:06:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552769#comment-17552769
 ]


Steve Loughran commented on ARROW-16746:
----------------------------------------

hadoop s3a maps user attributes to the filesystem XAttr APIs, very soon to let 
you also set them when you create a file.

> [C++][Python] S3 tag support on write
> -------------------------------------
>
>                 Key: ARROW-16746
>                 URL: https://issues.apache.org/jira/browse/ARROW-16746
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: André Kelpe
>            Priority: Major
>
> S3 allows tagging data to better organize ones data 
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] 
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is 
> causing us to scan the bucket and re-apply the tags after a pyrrow based 
> process has run.
> I looked through the code and think that it could potentially be done via the 
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

Reply via email to