XuQianJin-Stars opened a new pull request, #3286:
URL: https://github.com/apache/fluss/pull/3286

   TieringSourceEnumerator now acquires a KV snapshot lease for all 
TieringSnapshotSplits before they are assigned to readers, and releases the 
lease when the table finishes or fails tiering, or when a reader failover 
returns the splits. A best-effort `dropLease` is also performed on enumerator 
close. This prevents the Fluss server from cleaning up snapshots that the 
tiering job still depends on.
   
   One lease id per tiering job (UUID-based) is reused across tables and 
persisted into `TieringSourceEnumeratorState` so that it survives enumerator 
restore instead of leaking orphan leases. The lease uses a fixed 1-day duration 
that is implicitly renewed by every `acquireSnapshots` call, and 
`UnsupportedVersionException` from older Fluss servers is downgraded to a 
warning to keep backward compatibility.
   
   ### Purpose
   
   Linked issue: close #2898
   
   Before this change, the tiering job only read Fluss KV snapshots without 
holding any lease on them. A long-running tiering job could therefore race with 
the server-side snapshot GC: the server may clean up a snapshot that is still 
being / about to be consumed by the tiering `SourceReader`, causing tiering 
failures or data loss on the lake side.
   
   This PR makes `TieringSource` hold a KV snapshot lease for the full 
lifecycle of each snapshot split it hands out, so that the Fluss server will 
not reclaim those snapshots while tiering is in progress.
   
   ### Brief change log
   
   - `TieringSourceEnumerator`
     - Generate one `kvSnapshotLeaseId` per tiering job (UUID-based) and reuse 
it across all tables.
     - Before assigning any `TieringSnapshotSplit` to a reader, call 
`acquireSnapshots(leaseId, snapshots, 1 day)` on the admin / gateway client to 
acquire a lease covering all snapshot splits of the table.
     - Track in-flight leased snapshots per table; release the lease 
(`releaseSnapshots`) when the table finishes tiering, fails tiering, or when a 
reader failover returns the splits back to the enumerator.
     - On enumerator `close()`, best-effort `dropLease(leaseId)` to release 
everything still held by this job.
     - Downgrade `UnsupportedVersionException` (old server) to a warning log, 
so the tiering job keeps working against older Fluss servers without the lease 
API.
   - `TieringSourceEnumeratorState` + `TieringSourceEnumeratorStateSerializer`
     - Persist `kvSnapshotLeaseId` into the enumerator checkpoint state (new 
serializer version, backward compatible with the previous version).
   - `TieringSource`
     - On `restoreEnumerator`, reuse the persisted `kvSnapshotLeaseId` from the 
checkpoint so the recovered enumerator does not generate a new UUID and leak 
the previous lease.
   
   ### Tests
   
   - `TieringSourceEnumeratorTest`
     - New cases covering: lease is acquired before snapshot splits are 
assigned; lease is released on table finish / fail / reader failover; 
`dropLease` is invoked on enumerator close; enumerator works gracefully when 
the server returns `UnsupportedVersionException`.
   - `TieringSourceEnumeratorStateSerializerTest`
     - Round-trip tests for the new `kvSnapshotLeaseId` field, plus a 
backward-compatibility case that deserializes a state written by the previous 
serializer version.
   - `mvn clean verify` passes locally for the affected modules.
   
   ### API and Format
   
   - No public user-facing API change.
   - `TieringSourceEnumeratorState` checkpoint format is extended with a new 
`kvSnapshotLeaseId` field. The serializer version is bumped and older 
checkpoints remain readable (the field defaults to a freshly generated UUID on 
restore from old state).
   - No storage / wire format change on the Fluss server side; this PR only 
consumes the existing `acquireSnapshots` / `releaseSnapshots` / `dropLease` 
admin APIs.
   
   ### Documentation
   
   - No new user-facing feature or configuration option is introduced, so no 
documentation update is required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to