mjsax commented on PR #16970: URL: https://github.com/apache/kafka/pull/16970#issuecomment-2313771856
Seems this test is full of race conditions? -- But it seems there is no easy way to control it fully... Not sure if using more input records by itself is the right way to go though? Seems only to reduce the likelihood that it fails? Should we maybe instead change the test condition (eg, we could count down `onRestoreSuspendedLatch` also in `onRestoreEnd` callback)? Or we could decrease `MAX_POLL_RECORDS_CONFIG` to `1` to slow down restoration even more, and maybe make the `RESTORATION_DELAY` "dynamic" -- ie, keep it at 500ms, but after we reached a condition, reduce it to zero? For the new test failure: given that we need to restore 1000 records, now, it seems we might just need more time to transit to RUNNING. Not the test log line: ``` [shouldInvokeUserDefinedGlobalStateRestoreListeners_dZty6RRJSL5X__RAIPGg-ks2-StreamThread-1] task [0_0] Suspended from RESTORING ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
