[
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552922#comment-17552922
]
Steve Loughran commented on ARROW-16746:
----------------------------------------
yes. we use them a bit in the s3a committers, to annotate a zero byte marker
file with the length they will finally ;get when manifest at their destination.
in HADOOP-17833 that's beiing exposed in the createFile(path) buiilder api,
where apps can set headers at create time. presumably gcs and azure could be
wired up differently. they both have the advantage that you can edit file
attributes after creation.
> [C++][Python] S3 tag support on write
> -------------------------------------
>
> Key: ARROW-16746
> URL: https://issues.apache.org/jira/browse/ARROW-16746
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: André Kelpe
> Priority: Major
>
> S3 allows tagging data to better organize ones data
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)]
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is
> causing us to scan the bucket and re-apply the tags after a pyrrow based
> process has run.
> I looked through the code and think that it could potentially be done via the
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da
--
This message was sent by Atlassian Jira
(v8.20.7#820007)