Source: dolfin
Followup-For: Bug #920546

I have a hunch the timeout problem might be related to
oversubscription of CPUs in mpi runs.

(in principle the same would apply to python MPI tests, presumeably
the python/MPI interface would "slow down" messages enough to avoid
the race condition)

I've uploaded 2018.1.0.post1-18 to print the number of available CPUs
at test time, to test if oversubscription is a plausible explanation.

Currently oversubscription is permitted at up to 2 jobs per CPU. The
demos use 3 processes each.  So if 4 CPU are available then 2 jobs
(6 processes) are run, which would be 50% oversubscribed.

If that is the case and correlates with MPI C++ timeouts, then the
next step is to strictly never oversubscribe (but if only 1 or 2 CPU
is available then the first job of 3 processes must still be
oversubscribed)

Reply via email to