Hello NM users, We recently encountered a strange error message for nonmem7.5.0 in the HPC cluster, which only occurred for models with long run time (occurred around 1.5 day with 180 nodes). The model was terminated because of this, but it could be continued with the msf file.
Any help would be appreciated. Please let me know if you need more information to figure this out. Thanks a lot, Tong Starting 1 NONMEM executions. 1 in parallel. S:1 .. All executions started. Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x7ae33f in ??? #1 0x7e27ef in ??? #2 0x7e4003 in ??? #3 0x7d623f in ??? #4 0x7b764c in ??? #5 0x7b79ca in ??? #6 0x7b0498 in ??? #7 0x45de7c in ??? #8 0x45e08a in ??? #9 0x591109 in ??? #10 0x5853d4 in ??? #11 0x541c05 in ??? #12 0x657652 in ??? #13 0x5914fd in ??? #14 0x404794 in ??? #15 0x8fa8c3 in ??? #16 0x8fab40 in ??? #17 0x404c4b in ??? #18 0xffffffffffffffff in ??? [proxy:0:0@nc260] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed [proxy:0:0@nc260] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:0@nc260] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event [proxy:0:6@nc266] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed [proxy:0:4@nc264] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed [proxy:0:4@nc264] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:4@nc264] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event [proxy:0:2@nc262] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed [proxy:0:2@nc262] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:2@nc262] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event [proxy:0:1@nc261] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed [proxy:0:1@nc261] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:1@nc261] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event [proxy:0:5@nc265] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed [proxy:0:5@nc265] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:5@nc265] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event [proxy:0:6@nc266] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status [proxy:0:6@nc266] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event srun: error: nc262: task 2: Exited with exit code 7 srun: error: nc266: task 6: Exited with exit code 7 srun: error: nc265: task 5: Exited with exit code 7 srun: error: nc261: task 1: Exited with exit code 7 srun: error: nc264: task 4: Exited with exit code 7 srun: error: nc260: task 0: Exited with exit code 7 [mpiexec@nc260] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting [mpiexec@nc260] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion [mpiexec@nc260] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion [mpiexec@nc260] main (ui/mpich/mpiexec.c:340): process manager error waiting for completion NONMEM run failed. Check the lst-file in NM_run1 for errors Not restarting this model. F:1 .. execute done