Baunsgaard opened a new pull request, #2472:
URL: https://github.com/apache/systemds/pull/2472

   FedWorkerReadMatrixCompress.verifyRead failed roughly once per ten 
component-test CI runs because it called FederatedTestUtils.wait(1000) to give 
the worker time to finish its async compression (kicked off by 
CompressedMatrixBlockFactory.compressAsync), then asserted that the returned 
block was a CompressedMatrixBlock. On a contended runner the 1 s sleep was not 
enough, the subsequent read returned the still- uncompressed block, and the 
assertion failed. Surefire's rerunFailingTestsCount=2 hid this as a "Flake" 
rather than a job failure.
   
   Add FedWorkerBase.awaitCompressed(long id), which polls getMatrixBlock at 25 
ms intervals for up to COMPRESS_TIMEOUT_MS (10 s) and returns as soon as the 
worker reports the compressed form, or returns the last- observed block on 
timeout so the caller's assertion still produces a meaningful failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to