leaves12138 opened a new pull request, #7972:
URL: https://github.com/apache/paimon/pull/7972

   ### Purpose
   
   `row-id` reassignment rewrites row-id ranges for data evolution tables. If a 
global index build or index compact job plans work before reassignment and 
commits after reassignment, it can commit stale index metadata for the old row 
ids.
   
   This PR adds a snapshot-level `row-id-reassign=true` marker and makes 
row-id-sensitive index commits fail fast when such a marker appears after the 
snapshot they planned from.
   
   ### Changes
   
   - Add `Snapshot.ROW_ID_REASSIGN_PROPERTY`.
   - Add row-id-reassign conflict detection in `ConflictDetection` and wire it 
through `FileStoreCommit` / table commit implementations.
   - Record the scan snapshot used by btree and generic global index builders.
   - Pass the planned scan snapshot into btree and generic global index commits.
   - Strip the transient reassignment marker from normal manifest 
rewrite/compact snapshots.
   - Add focused tests for detecting reassignment conflicts and 
preserving/removing snapshot properties.
   
   ### Validation
   
   - `mvn -pl paimon-core -Dtest=ConflictDetectionTest,FileStoreCommitTest test`
   - `mvn -pl paimon-flink/paimon-flink-common -Dtest=BTreeIndexTopoBuilderTest 
test`
   - Preprod validation with a data evolution table:
     - created a 48-partition table state with 1,177,600 rows and incremental 
btree index holes;
     - ran incremental btree index build and row-id reassignment concurrently;
     - confirmed reassignment snapshot carried `row-id-reassign=true`;
     - confirmed stale index commit fails with:
       `Row-id reassignment snapshot 904 was committed after the task planned 
from snapshot 903. The task must be retried with the latest row ids.`
     - retried index build after reassignment and verified `MISSING_INDEX=0` 
plus indexed point queries returned the expected rows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to