Ralph,

i noticed a file descriptor leak with current master.

that can be easily reproduced with the loop_spawn test from the ibm/dynamic test suite

mpirun -np 1 ./loop_spawn

after a few seconds, you can see the leak via
lsof -p $(pidof mpirun)

there is a bunch of files such as
mpirun 20791 gilles 76u unix 0xffff8800a087e580 0t0 1066703 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 77u unix 0xffff88009ad1d2c0 0t0 1066954 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 78u unix 0xffff8800a087ed00 0t0 1066823 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 79u unix 0xffff88009ad1cf00 0t0 1066840 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 80u unix 0xffff8800a087f480 0t0 1068077 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 81u unix 0xffff88009ad1da40 0t0 1068094 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 82u unix 0xffff8800a087d680 0t0 1068195 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791 mpirun 20791 gilles 83u unix 0xffff88009ad1de00 0t0 1068212 /tmp/openmpi-sessions-1000@c7_0/7615/0/0/pmix-20791


in server_switchyard(), i noticed

    if (PMIX_FINALIZE_CMD == cmd) {
[...]
        /* turn off the recv event - we shouldn't hear anything
         * more from this proc */
        if (peer->recv_ev_active) {
            event_del(&peer->recv_event);
            peer->recv_ev_active = false;
        }
        return rc;
    }

and it looks like peer->sd is never closed

fwiw, i naively closed it here, and i got an other leak (pipes)

could you please have a look at this ?

Cheers,

Gilles

Reply via email to