Baunsgaard opened a new pull request, #2517:
URL: https://github.com/apache/systemds/pull/2517

   
   `**.functions.federated.monitoring.**,**.functions.federated.multitenant.**` 
was the only federated test group running at the default surefire parallelism 
(`parallel=classes`, `threadCount=2`). These tests spawn worker JVMs on fixed 
ports, run Spark, and share the static `/tmp/systemds` working directory, so 
two classes per fork race on those resources.
   
   ## Symptoms
   
   - `Failed to create non-existing local working directory: /tmp/systemds`
   - `Federated worker processes on port N died before becoming ready`
   - All tests finish, then a leaked worker/Spark thread keeps the fork JVM 
alive until the 30m job cap cancels it.
   
   ## Change
   
   - Run the group with `-Dtest-threadCount=1 -Dtest-forkCount=1`, matching 
every other federated group (the `federated.primitives.part1-5` groups already 
use this), so the classes execute serially and no longer contend for ports and 
the shared working directory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to