ivandika3 commented on code in PR #9334: URL: https://github.com/apache/ozone/pull/9334#discussion_r2564207037
########## hadoop-hdds/docs/content/design/s3-conditional-requests.md: ########## @@ -0,0 +1,149 @@ +--- +title: "S3 Conditional Requests" +summary: Design to support S3 conditional requests for atomic operations. +date: 2025-11-20 +jira: HDDS-13117 +status: draft +author: Chu Cheng Li +--- + +# S3 Conditional Requests Design + +## Background + +AWS S3 supports conditional requests using HTTP conditional headers, enabling atomic operations, cache optimization, and preventing race conditions. This includes: + +- **Conditional Writes** (PutObject): `If-Match` and `If-None-Match` headers for atomic operations +- **Conditional Reads** (GetObject, HeadObject): `If-Match`, `If-None-Match`, `If-Modified-Since`, `If-Unmodified-Since` for cache validation +- **Conditional Copy** (CopyObject): Conditions on both source and destination objects + +### Current State + +- HDDS-10656 implemented atomic rewrite using `expectedDataGeneration` +- OM HA uses single Raft group with single applier thread (Ratis StateMachineUpdater) +- S3 gateway doesn't expose conditional headers to OM layer + +## Use Cases + +### Conditional Writes +- **Atomic key rewrites**: Prevent race conditions when updating existing objects +- **Create-only semantics**: Prevent accidental overwrites (`If-None-Match: *`) +- **Optimistic locking**: Enable concurrent access with conflict detection +- **Leader election**: Implement distributed coordination using S3 as backing store + +### Conditional Reads +- **Bandwidth optimization**: Avoid downloading unchanged objects (304 Not Modified) +- **HTTP caching**: Support standard browser/CDN caching semantics +- **Conditional processing**: Only process objects that meet specific criteria + +### Conditional Copy +- **Atomic copy operations**: Copy only if source/destination meets specific conditions +- **Prevent overwrite**: Copy only if destination doesn't exist + +## AWS S3 Conditional Write + +### Specification + +#### If-None-Match Header + +``` +If-None-Match: "*" +``` + +- Succeeds only if object does NOT exist +- Returns `412 Precondition Failed` if object exists +- Primary use case: Create-only semantics + +#### If-Match Header + +``` +If-Match: "<etag>" +``` + +- Succeeds only if object EXISTS and ETag matches +- Returns `412 Precondition Failed` if object doesn't exist or ETag mismatches +- Primary use case: Atomic updates (compare-and-swap) + +#### Restrictions + +- Cannot use both headers together in same request +- No additional charges for failed conditional requests + +### Implementation + +#### Architecture Overview + +#### If-None-Match Implementation + +##### S3 Gateway Layer + +1. Parse `If-None-Match: *`. +2. Set `existingKeyGeneration = -1`. +3. Call `RpcClient.rewriteKey()`. + +##### OM Create Phase + +1. Validate `expectedDataGeneration == -1`. +2. If key exists → throw `KEY_ALREADY_EXISTS`. +3. Store `-1` in open key metadata. + +##### OM Commit Phase + +1. Check `expectedDataGeneration == -1` from open key. +2. If key now exists (race condition) → throw `KEY_ALREADY_EXISTS`. +3. Commit key. + +##### Race Condition Handling + +Using `-1` ensures atomicity. If a concurrent write (Client B) commits between Client A's Create and Commit, Client A's commit fails the `-1` validation check (key now exists), preserving strict create-if-not-exists semantics. + +#### If-Match Implementation + +Leverages existing `expectedDataGeneration` from HDDS-10656: + +##### S3 Gateway Layer + +1. Parse `If-Match: "<etag>"` header +2. Look up existing key via `getS3KeyDetails()` +3. Validate ETag matches, else throw `PRECOND_FAILED` (412) +4. Extract `expectedGeneration` from existing key +5. Pass `expectedGeneration` to RpcClient Review Comment: > Note that verifying ETag during the preexecute phase does not increase the overhead of writing to the Raft log, so we don't need to worry about that. preExecute can be called in parallel (in multiple OM handler threads), so we should instead verify the ETag in `validateAndUpdateCache` instead to ensure atomicity. Note that permission check was put to preExecute for performance reasons and the community discussed that consistency tradeoff is acceptable. > agree that we need to reduce the RTT for If-Match request, my original thinking is that I want to avoid the "concepts of S3" appear in Ozone Manager, but seems there are already a lots of them, I think it's ok to do so, plus the performance would be better. Yes, we already have multipart uploads and `OmKeyInfo.tags` that is used only for s3 use case. > So If-Match request dont need the atomic key rewrite anymore. But how about we keep the if-none-match request to use the atomic with extended "CREATE IF NOT EXIST" capability, which will be added in https://github.com/apache/ozone/pull/9332 Yes, I'm OK with reusing atomic rewrite for "if-none-match" so we can keep that approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
