Hi All, The problem is that this is only message I got... I also get this Warning: -------------------------------------------------------------------------- WARNING: Open MPI will create a shared memory backing file in a directory that appears to be mounted on a network filesystem. Creating the shared memory backup file on a network file system, such as NFS or Lustre is not recommended -- it may cause excessive network traffic to your file servers and/or cause shared memory traffic in Open MPI to be much slower than expected.
You may want to check what the typical temporary directory is on your node. Possible sources of the location of this temporary directory include the $TEMPDIR, $TEMP, and $TMP environment variables. Note, too, that system administrators can set a list of filesystems where Open MPI is disallowed from creating temporary files by settings the MCA parameter "orte_no_session_dir". Local host: n344 Fileame: /tmp/openmpi-sessions-didymos@n344_0 /19430/1/shared_mem_pool.n344 You can set the MCA paramter shmem_mmap_enable_nfs_warning to 0 to disable this message. -------------------------------------------------------------------------- but this I got also with gromacs 4.5.5 which is running ok so I think this is not a problem in my case. Like Alexey notice the problem is that my nodes have different architecture.. but this was not a problem with gromacs 4.5.5 My access node: processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 1 model name : AMD Opteron(TM) Processor 6272 stepping : 2 cpu MHz : 2400.003 cache size : 2048 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 16 apicid : 32 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid amd_dcm pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 po pcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt nodeid_m sr arat bogomips : 4199.99 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate [9] My computational node: processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 8354 stepping : 3 cpu MHz : 2200.001 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm c mp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4399.99 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate Thanks a lot! Best! tomek On Sun, Feb 17, 2013 at 2:37 PM, Alexey Shvetsov <[email protected]>wrote: > Hi! > > В письме от 16 февраля 2013 23:27:45 пользователь Tomek Wlodarski написал: > > Hi! > > > > I have problem in running gromacs 4.6 in PBS queue... > > I end up with error: > > > > > > [n370:03036] [[19430,0],0]-[[19430,1],8] mca_oob_tcp_msg_recv: readv > > failed: Connection reset by peer (104) > > > -------------------------------------------------------------------------- > > mpirun noticed that process rank 18 with PID 616 on node n344 exited on > > signal 4 (Illegal instruction). > > Aha. Your mdrun process got SIGILL. This means that your nodes have > different > instruction set then head node. So try to use different acceleration level. > Can you share details about your hw? > > > > -------------------------------------------------------------------------- > > [n370:03036] 3 more processes have sent help message > > help-opal-shmem-mmap.txt / mmap on nfs > > [n370:03036] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > > help / error messages > > 3 total processes killed (some possibly by mpirun during cleanup) > > > > I run the same pbs files with older gromacs 4.5.5 (installed with the > same > > openmpi, gcc and fftw) and everything is working.. > > > > also when I am running gromacs directly on the access node: > > > > mpirun -np 32 /home/users/didymos/gromacs/bin/mdrun_mpi -v -deffnm > > protein-EM-solvated -c protein-EM-solvated.gro > > > > it is running OK. > > Any ideas? > > Thank you! > > Best! > > > > tomek > -- > Best Regards, > Alexey 'Alexxy' Shvetsov > Petersburg Nuclear Physics Institute, NRC Kurchatov Institute, > Gatchina, Russia > Department of Molecular and Radiation Biophysics > Gentoo Team Ru > Gentoo Linux Dev > mailto:[email protected] > mailto:[email protected] > mailto:[email protected] > -- > gmx-users mailing list [email protected] > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to [email protected]. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

