huaxingao commented on PR #3205:
URL: https://github.com/apache/polaris/pull/3205#issuecomment-3679904001

   @dennishuo Thanks for your comment! I have updated the PR description to 
include the links to the proposal and the mailing list discussion thread.
   
   > "Reconciliation flow" - how we distinguish from different IN_PROGRESS 
error states
   
   At a high level, we don’t try to infer everything from the idempotency row 
alone. We use it together with the catalog state: heartbeats/expirations tell 
us when a row looks stuck, and the catalog tells us whether the underlying 
mutation actually happened (never ran, ambiguous, or completed).
   
   How we detect rows that need reconciliation
   
   - While a handler is running it periodically calls updateHeartbeat, which 
keeps heartbeat_at fresh for the owning executor_id.
   - A background job (or on‑demand path) scans for rows where http_status IS 
NULL and either expires_at < now() or heartbeat_at is stale / null for that 
executor_id. Those rows are “stuck IN_PROGRESS” and get handed to the 
reconciler, keyed by (operation_type, resource_id).
   
   Case 1 – crashed before doing any durable mutations
   
   - For the first operations we’re targeting (e.g. commit-table), all catalog 
changes are wrapped in a single metastore transaction, so from the catalog’s 
point of view there are only two possibilities: the commit exists, or it 
doesn’t.
   - If the reconciler can’t find any evidence of the mutation (no commit / no 
new metadata, invariants all look unchanged), it treats this as “no durable 
mutations happened” and simply releases the reservation (delete/expire the 
row). A future request with the same idempotency key can then reserve again and 
actually run the operation.
   
   Case 2 – crash mid‑mutation (truly ambiguous)
   
   - With the current single‑transaction model we don’t expect a 
partially‑committed state: the DB either commits or rolls back.
   - If in the future we introduce an operation with multiple independently 
durable steps, its reconciler would encode invariants over its state and, if it 
ever saw a partially‑updated / inconsistent view, it would treat that as truly 
ambiguous and finalize the idempotency row as a terminal error (e.g. 
idempotency_reconcile_failed). Duplicates would then see a stable failure 
rather than risk double‑applying side effects.
   
   Case 3 – crashed after successfully mutating but before idempotency‑key 
finalization
   
   - If the reconciler does see that the mutation clearly completed (e.g. the 
expected commit/metadata for (operation_type, resource_id) is present and 
consistent), it treats the operation as logically successful.
   - It then reconstructs the “minimal” response from the canonical state plus 
the stored response_summary, calls finalizeRecord with a 2xx http_status, and 
any future duplicate requests with the same key just replay that success 
instead of re‑running the mutation.
   
   So the “never started vs mid‑flight vs finished but not finalized” 
distinction really comes from the per‑operation reconciliation logic against 
the catalog; heartbeat_at / executor_id / expires_at only decide when a row is 
suspicious enough to hand over to that reconciler.
   
   > How we intend to actually use response_summary to ensure reconstructible 
responses for duplicate requests. 
   
   For response_summary, the intent is exactly what you’re describing: it’s not 
meant to store the full HTTP response body, but a small, operation‑specific 
“replay token” that lets us reconstruct an equivalent response for duplicates.
   Concretely:
   
   - Each idempotent endpoint will have a small adapter that defines:
   
       1. how to take the “real” response and distill it into a compact JSON 
response_summary plus a whitelisted set of headers serialized into 
response_headers, and
       
       2. how to take that summary and reconstruct an HTTP response that is 
semantically equivalent for the client.
   
   - For “small” responses, that summary can just be the full body (e.g., a 
tiny JSON result), but for large responses we’ll only store stable identifiers 
/ pointers instead of the entire payload.
   - For example, for commit-table / updateTable we would not store the full 
TableMetadata blob in the idempotency row. Instead, the response_summary would 
likely contain:
       
       1. the fully‑qualified table identifier,
   
       2. a pointer to the metadata location (e.g., the metadata JSON file path 
/ version / snapshot id), and
   
       3. any other minimal fields needed to reproduce the wire‑level response.
   
   When replaying a duplicate, the reconstructor can follow that pointer back 
to the metastore / object storage, load the canonical metadata, and build the 
same logical response that the original call would have produced.
   
   I agree this needs to be spelled out per operation_type, so in the design 
doc I plan to add a small subsection for each idempotent operation (e.g., 
commit-table, drop-table, etc.) that defines the shape of its response_summary 
and how its reconstructor turns that back into an HTTP response. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to