Baunsgaard opened a new pull request, #2472: URL: https://github.com/apache/systemds/pull/2472
FedWorkerReadMatrixCompress.verifyRead failed roughly once per ten component-test CI runs because it called FederatedTestUtils.wait(1000) to give the worker time to finish its async compression (kicked off by CompressedMatrixBlockFactory.compressAsync), then asserted that the returned block was a CompressedMatrixBlock. On a contended runner the 1 s sleep was not enough, the subsequent read returned the still- uncompressed block, and the assertion failed. Surefire's rerunFailingTestsCount=2 hid this as a "Flake" rather than a job failure. Add FedWorkerBase.awaitCompressed(long id), which polls getMatrixBlock at 25 ms intervals for up to COMPRESS_TIMEOUT_MS (10 s) and returns as soon as the worker reports the compressed form, or returns the last- observed block on timeout so the caller's assertion still produces a meaningful failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
