Folks,
Mellanox Jenkins marks recent PR's as failed for very surprising reasons.
mpirun --mca btl sm,self ...
failed because processes could not contact each other. i was able to
reproduce this once on my workstation,
and found the root cause was a dirty build and/or install dir.
i added some debug in autogen.sh and found that :
- the workspace (install dir) contains some old files
- it seems all PR's use the same workspace (if it was clean, that would
be ok as long as Jenkins process only one PR at a time)
- there are currently two PR's being processed for the ompi-release
repo, and per the log, they seem to use run from the very same directory
- Jenkins for the pmix repo seems to suffer the same issue
could someone have a look at this ?
Cheers,
Gilles