[GitHub] [iceberg] singhpk234 commented on a diff in pull request #4443: Docs: Add S3 delete tagging docs

GitBox Mon, 04 Apr 2022 22:28:14 -0700


singhpk234 commented on code in PR #4443:
URL: https://github.com/apache/iceberg/pull/4443#discussion_r842354307



##########
docs/integrations/aws.md:
##########
@@ -419,9 +419,22 @@ If for any reason you have to use S3A, here are the 
instructions:
 To ensure integrity of uploaded objects, checksum validations for S3 writes 
can be turned on by setting catalog property `s3.checksum-enabled` to `true`. 
 This is turned off by default.
 
+### S3 Delete 
+
+When `s3.delete-enabled` is disabled, the objects are not deleted from S3.
+For example, to disable deletion with Spark 3.0, you can start the Spark SQL 
shell with:

Review Comment:
   [minor] can add what is the default value (though it might be obvious) , 
referring the checksum verification example above



##########
docs/integrations/aws.md:
##########
@@ -433,6 +446,25 @@ spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCata
 ```
 For the above example, the objects in S3 will be saved with tags: 
`my_key1=my_val1` and `my_key2=my_val2`.

Review Comment:
   I think we might need to call out here that these tags specified will be 
used only in object creation like we called out that delete tags will be used 
in tagging in delete operation 



##########
docs/integrations/aws.md:
##########
@@ -433,6 +446,25 @@ spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCata
 ```
 For the above example, the objects in S3 will be saved with tags: 
`my_key1=my_val1` and `my_key2=my_val2`.
 
+We can add tags before deleting the objects as well. For example, to add S3 
delete tags with Spark 3.0, you can start the Spark SQL shell with: 
+
+```
+sh spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
+    --conf 
spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
 \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
+    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key1=my_val1 \
+    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2 \
+    --conf spark.sql.catalog.my_catalog.s3.delete.tags.my_key3=my_val3 \
+    --conf spark.sql.catalog.my_catalog.s3.delete-enabled=false
+```
+
+When `s3.delete-enabled` is disabled, users are expected to set the delete 
tags with `s3.delete.tags` and manage the deleted files through S3 lifecycle 
policy.
+With the `s3.delete.tags` config, objects are tagged with the configured 
key-value pairs before deletion. This is considered a soft-delete, because 
users can configure tag-based object lifecycle policy at bucket level to 
transition objects to different tiers.
+For more details, see [managing your storage lifecycle 
documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html).
+
+For the above example, the objects in S3 will be saved with tags: 
`my_key1=my_val1`, `my_key2=my_val2` and `my_key3=my_val3` before deletion.

Review Comment:
   since in 
[L#459](https://github.com/apache/iceberg/pull/4443/files#diff-45ac9a4ad0d5c90123d0537d61bdcd243a7c06b684b39a3914b831ee35c11052R459)
 we have disabled delete should we call it soft-delete here ? Or can we remove 
the s3.delete-enabled conf from the example as per my understanding it adds 
less value to the context of the delete tags now (since we moved this in a 
separate section).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #4443: Docs: Add S3 delete tagging docs

Reply via email to