luoyuxia commented on issue #1583: URL: https://github.com/apache/fluss/issues/1583#issuecomment-3266326146
After looking into the fail instance https://github.com/apache/fluss/actions/runs/17547578108/job/49832509669?pr=1375 Saw such logs: ``` **16:36:55,281 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 0** **16:36:56,197 [fluss-scheduler-0-thread-2] INFO org.apache.fluss.server.replica.Replica [] - Shrink ISR From [0, 1, 2] to [0]. Leader: (high watermark: 0, end offset: 10, out of sync replicas: [1, 2])** 16:36:56,200 [fluss-netty-client(EPOLL)-61-3] INFO org.apache.fluss.server.replica.Replica [] - ISR updated to [0] and bucket epoch updated to 1 for bucket TableBucket{tableId=0, bucket=0} **16:36:56,202 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 10** 16:36:56,204 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.log.WriterStateManager [] - Wrote writer snapshot at offset 10 with 0 producer ids for table bucket TableBucket{tableId=0, bucket=0} in 2 ms. **16:36:56,206 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 20** 16:36:56,207 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.log.WriterStateManager [] - Wrote writer snapshot at offset 20 with 0 producer ids for table bucket TableBucket{tableId=0, bucket=0} in 1 ms. 16:36:56,208 [fluss-netty-server-worker-thread-3] INFO org.apache.fluss.server.replica.fetcher.ReplicaFetcherManager [] - Remove fetcher for buckets: [TableBucket{tableId=0, bucket=0}] 16:36:56,209 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.replica.fetcher.ReplicaFetcherManager [] - Remove fetcher for buckets: [TableBucket{tableId=0, bucket=0}] **16:36:56,210 [fluss-netty-server-worker-thread-1] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 30** ``` From the logs, we can know: - Firstly, it try to append 10 records according to log `Rolled new log segment at offset 0` - The followers are out of sync according to log `Shrink ISR From [0, 1, 2] to [0]` - Then, it append the other records accoding to log `Rolled new log segment at offset 10`, `Rolled new log segment at offset 20*` Since the followers don't sync any records so that the verification to verify followers have 3 segments will fail. Seems `log.replica.max-lag-time` is too short to wait the other followers to sync the log before in this test case. Maybe we can increase it to 5 or 10 seconds. Although longer, should more stable. What's more, I check the log for success test case: ``` **330 [ReplicaFetcherThread-0-2] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 0** **7337 [fluss-netty-server-worker-thread-2] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 10** ... **7870 [fluss-netty-server-worker-thread-2] INFO org.apache.fluss.server.log.LocalLog [] - Rolled new log segment at offset 20** ... **11073 [fluss-scheduler-0-thread-8] INFO org.apache.fluss.server.replica.Replica [] - Shrink ISR From [2, 1, 0] to [2]. Leader: (high watermark: 30, end offset: 40, out of sync replicas: [0, 1])** ``` Shrink ISR(which is expected to be trigger by code ` FLUSS_CLUSTER_EXTENSION.stopReplica(stopFollower, tb, 1);`) should happen after append records successfully -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@fluss.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org