Ralph,
i noticed a file descriptor leak with current master.
that can be easily reproduced with the loop_spawn test from the
ibm/dynamic test suite
mpirun -np 1 ./loop_spawn
after a few seconds, you can see the leak via
lsof -p $(pidof mpirun)
there is a bunch of files such as
mpirun 20791 gilles 76u unix 0xffff8800a087e580 0t0 1066703
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 77u unix 0xffff88009ad1d2c0 0t0 1066954
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 78u unix 0xffff8800a087ed00 0t0 1066823
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 79u unix 0xffff88009ad1cf00 0t0 1066840
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 80u unix 0xffff8800a087f480 0t0 1068077
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 81u unix 0xffff88009ad1da40 0t0 1068094
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 82u unix 0xffff8800a087d680 0t0 1068195
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
mpirun 20791 gilles 83u unix 0xffff88009ad1de00 0t0 1068212
/tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791
in server_switchyard(), i noticed
if (PMIX_FINALIZE_CMD == cmd) {
[...]
/* turn off the recv event - we shouldn't hear anything
* more from this proc */
if (peer->recv_ev_active) {
event_del(&peer->recv_event);
peer->recv_ev_active = false;
}
return rc;
}
and it looks like peer->sd is never closed
fwiw, i naively closed it here, and i got an other leak (pipes)
could you please have a look at this ?
Cheers,
Gilles