Re: [I] CommitRemoteLogManifestITCase#testDeleteOutOfSyncReplicaLogAfterCommit is unstable [fluss]

via GitHub Mon, 08 Sep 2025 06:29:23 -0700


luoyuxia commented on issue #1583:
URL: https://github.com/apache/fluss/issues/1583#issuecomment-3266326146


   After looking into the fail instance  
https://github.com/apache/fluss/actions/runs/17547578108/job/49832509669?pr=1375
   Saw such logs:
   ```
   **16:36:55,281 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.log.LocalLog                         [] - Rolled new 
log segment at offset 0**
   **16:36:56,197 [fluss-scheduler-0-thread-2] INFO  
org.apache.fluss.server.replica.Replica                      [] - Shrink ISR 
From [0, 1, 2] to [0]. Leader: (high watermark: 0, end offset: 10, out of sync 
replicas: [1, 2])**
   16:36:56,200 [fluss-netty-client(EPOLL)-61-3] INFO  
org.apache.fluss.server.replica.Replica                      [] - ISR updated 
to [0] and bucket epoch updated to 1 for bucket TableBucket{tableId=0, bucket=0}
   **16:36:56,202 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.log.LocalLog                         [] - Rolled new 
log segment at offset 10**
   16:36:56,204 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.log.WriterStateManager               [] - Wrote writer 
snapshot at offset 10 with 0 producer ids for table bucket 
TableBucket{tableId=0, bucket=0} in 2 ms.
   **16:36:56,206 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.log.LocalLog                         [] - Rolled new 
log segment at offset 20**
   16:36:56,207 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.log.WriterStateManager               [] - Wrote writer 
snapshot at offset 20 with 0 producer ids for table bucket 
TableBucket{tableId=0, bucket=0} in 1 ms.
   16:36:56,208 [fluss-netty-server-worker-thread-3] INFO  
org.apache.fluss.server.replica.fetcher.ReplicaFetcherManager [] - Remove 
fetcher for buckets: [TableBucket{tableId=0, bucket=0}]
   16:36:56,209 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.replica.fetcher.ReplicaFetcherManager [] - Remove 
fetcher for buckets: [TableBucket{tableId=0, bucket=0}]
   **16:36:56,210 [fluss-netty-server-worker-thread-1] INFO  
org.apache.fluss.server.log.LocalLog                         [] - Rolled new 
log segment at offset 30**
   ```
   From the logs, we can know:
   - Firstly, it try to append 10 records according to log  `Rolled new log 
segment at offset 0`
   - The followers are out of sync according to log `Shrink ISR From [0, 1, 2] 
to [0]`
   - Then, it append the other records accoding to log `Rolled new log segment 
at offset 10`, `Rolled new log segment at offset 20*`
   
   Since the followers don't sync any records so that the verification to 
verify followers have 3 segments will fail. 
   Seems `log.replica.max-lag-time` is too short to wait the other followers to 
sync the log before  in this test case. Maybe we can increase it to 5 or 10 
seconds. Although longer, should more stable.
   
   What's more, I check the log for success test case:
   ```
   **330 [ReplicaFetcherThread-0-2] INFO  org.apache.fluss.server.log.LocalLog 
[] - Rolled new log segment at offset 0**
   **7337 [fluss-netty-server-worker-thread-2] INFO  
org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 10**
   ...
   **7870 [fluss-netty-server-worker-thread-2] INFO  
org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 20**
   ...
   **11073 [fluss-scheduler-0-thread-8] INFO  
org.apache.fluss.server.replica.Replica [] - Shrink ISR From [2, 1, 0] to [2]. 
Leader: (high watermark: 30, end offset: 40, out of sync replicas: [0, 1])**
   ```
   Shrink ISR(which is expected to be trigger by code ` 
FLUSS_CLUSTER_EXTENSION.stopReplica(stopFollower, tb, 1);`) should happen after 
append records successfully 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@fluss.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] CommitRemoteLogManifestITCase#testDeleteOutOfSyncReplicaLogAfterCommit is unstable [fluss]

Reply via email to