ayush1300 commented on code in PR #7834: URL: https://github.com/apache/hadoop/pull/7834#discussion_r2247805315
########## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3aTagging.md: ########## @@ -0,0 +1,298 @@ +# S3 Object Tagging Support in Hadoop S3A Filesystem + +## Overview + +The Hadoop S3A filesystem connector now supports S3 object tagging, allowing users to automatically assign metadata tags to S3 objects during creation and soft deletion operations. This feature enables better data organization, cost allocation, access control, and lifecycle management for S3-stored data. + +**JIRA Issue**: [HADOOP-19536](https://issues.apache.org/jira/browse/HADOOP-19536#s3-tags) + +## Table of Contents + +- [Motivation](#motivation) +- [S3 Object Tagging Capabilities](#s3-object-tagging-capabilities) +- [Use Cases](#use-cases) +- [Configuration](#configuration) +- [Usage Examples](#usage-examples) +- [Soft Delete Feature](#soft-delete-feature) +- [Best Practices](#best-practices) +- [Limitations](#limitations) + +## Motivation + +Amazon S3 supports tagging objects with key-value pairs, providing several critical benefits: + +1. **Cost Allocation**: Track and allocate S3 storage costs across departments, projects, or cost centers +2. **Access Control**: Use tags in IAM policies to control object access permissions +3. **Lifecycle Management**: Trigger automated lifecycle policies for object transitions and expiration +4. **Data Classification**: Organize and classify data for compliance, security, and business requirements +5. **Analytics and Reporting**: Enable detailed analytics and reporting based on object metadata + +Previously, the Hadoop S3A connector lacked native support for object tagging, requiring users to implement custom solutions or use separate tools to tag objects post-creation. + +## S3 Object Tagging Capabilities + +### Tag Specifications +- **Maximum Tags**: Up to 10 tags per object +- **Structure**: Key-value pairs +- **Key Length**: Up to 128 Unicode characters +- **Value Length**: Up to 256 Unicode characters +- **Case Sensitivity**: Keys and values are case-sensitive +- **Uniqueness**: Tag keys must be unique per object (no duplicate keys) + +### Allowed Characters +Tag keys and values can contain: +- Letters (a-z, A-Z) +- Numbers (0-9) +- Spaces +- Special symbols: `. : + - = _ / @` + +## Use Cases + +### 1. Access Control with IAM Policies + +Control object access based on tags: + +```json +{ + "Effect": "Allow", + "Action": "s3:GetObject", + "Resource": "*", + "Condition": { + "StringEquals": { + "s3:ExistingObjectTag/department": "finance" + } + } +} +``` + +### 2. Lifecycle Management + +Trigger lifecycle rules based on tags: + +```json +{ + "Rules": [ + { + "Status": "Enabled", + "Filter": { + "Tag": { + "Key": "retention", + "Value": "temporary" + } + }, + "Expiration": { + "Days": 30 + } + } + ] +} +``` + +### 3. Cost Allocation and Tracking + +- Use tags for cost tracking in AWS Cost Explorer +- Allocate costs across different business units or projects +- Generate detailed billing reports by tag dimensions + +### 4. Data Analytics and Filtering + +- Use S3 Analytics to filter and analyze data by tags +- Create custom reports based on tagged object metadata +- Enable data governance and compliance reporting + +## Configuration + +### Object Creation Tags + +#### Method 1: Comma-Separated List +```properties +fs.s3a.object.tags=department=finance,project=alpha,owner=data-team +``` + +#### Method 2: Individual Tag Properties +```properties +fs.s3a.object.tag.department=finance +fs.s3a.object.tag.project=alpha +fs.s3a.object.tag.owner=data-team +fs.s3a.object.tag.environment=production +``` + +### Soft Delete Tags +```properties +fs.s3a.soft.delete.enabled=true Review Comment: 1. Yes the object will be tagged according to the tag given by the user or some default tag for deletion. 2. It is for recovery. Users can archive some s3 objects on the basis of tags and recover that in future when they need. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org