deniskuzZ commented on code in PR #6281:
URL: https://github.com/apache/hive/pull/6281#discussion_r2741818469
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/jdbc/queries/ReadyToCleanHandler.java:
##########
@@ -88,7 +88,7 @@ public String getParameterizedQueryString(DatabaseProduct
databaseProduct) throw
"ON \"cq1\".\"CQ_DATABASE\" = \"hwm\".\"MH_DATABASE\"" +
" AND \"cq1\".\"CQ_TABLE\" = \"hwm\".\"MH_TABLE\"";
- whereClause += " AND (\"CQ_HIGHEST_WRITE_ID\" < \"MIN_OPEN_WRITE_ID\" OR
\"MIN_OPEN_WRITE_ID\" IS NULL)";
+ whereClause += " AND (\"CQ_HIGHEST_WRITE_ID\" < \"MIN_OPEN_WRITE_ID\"-1
OR \"MIN_OPEN_WRITE_ID\" IS NULL)";
Review Comment:
sure.
**Scenario**
1. Multiple deltas exist with max writeId = 50
2. SELECT query (TXN_ID=100) starts:
- Takes snapshot, sees writeId 50 as committed
- Records minOpenWriteId = 51 (first writeId it cannot see)
- Begins reading from deltas up to writeId 50
3. Compaction (TXN_ID=101) runs:
- Produces base_50 (includes all data up to writeId 50)
- Does not create new writeIds
- Commits
4. Cleaner starts and evaluates cleanup eligibility
**Without the -1 offset**
Condition: CQ_HIGHEST_WRITE_ID < MIN_OPEN_WRITE_ID
- 50 < 51 = true → cleaner proceeds
- Problem: The SELECT query (TXN_ID=100) is still running and reading from
deltas
- The reader cannot switch to base_50 mid-query; its snapshot was taken
before compaction committed
- Cleaner fails with with (txnid:100 is open and <= hwm: 50)
**With the -1 offset**
Condition: CQ_HIGHEST_WRITE_ID < MIN_OPEN_WRITE_ID - 1
- 50 < 51 - 1 = 50 < 50 = false → cleaner does not proceed
- The cleaner waits until CQ_HIGHEST_WRITE_ID < MIN_OPEN_WRITE_ID - 1 is true
- This requires a gap of at least 1 writeId between compaction's HWM and the
minimum open writeId
The -1 offset protects long-running readers that started before compaction
committed by ensuring cleanup only proceeds when there's a writeId gap,
indicating it's safe to delete the obsolete deltas.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]