ok, I'll look into this. I noticed a problem with static builds on lustre file systems recently, and I was wandering whether its the same issue or not. But I'll check what's going on.
THanks Edgar On 10/30/2012 7:22 AM, Ralph Castain wrote: > No to Lustre, and I didn't build static > > I'm not sure what, if any, parallel file system might be present. In the case > that works, I just built with no configure args other than prefix. ompi_info > shows both romio and mpio built, but nothing more about what support they > built internally. > > > On Oct 30, 2012, at 4:14 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote: > >> Ralph, >> >> just out curiosity: is there a lustre file system on the machine and is >> this a static build ? >> >> Thanks >> Edgar >> >> On 10/29/2012 9:17 PM, Ralph Castain wrote: >>> Hmmm...I added that directory and tried this on odin (which is an IB-based >>> machine). Any MPI proc segfaults: >>> >>> Core was generated by `./hello'. >>> Program terminated with signal 11, Segmentation fault. >>> w#0 _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574 >>> 574 src/inode.c: No such file or directory. >>> in src/inode.c >>> (gdb) where >>> #0 _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574 >>> #1 0x00002aaaabd3f3e9 in _sysio_path_walk (parent=0x0, nd=0x7fffffffd8e0) >>> at src/namei.c:216 >>> #2 0x00002aaaabd3faad in _sysio_namei (parent=0x0, path=<value optimized >>> out>, flags=0, intnt=0x7fffffffd950, pnop=0x7fffffffd970) at src/namei.c:505 >>> #3 0x00002aaaabd3fd98 in open (path=0x2aaaac24280f >>> "/sys/devices/system/node", flags=<value optimized out>) at src/open.c:179 >>> #4 0x00002aaaabd43d5b in opendir (name=0x2aaaac24280f >>> "/sys/devices/system/node") at src/stddir.c:60 >>> #5 0x00002aaaac241825 in numa_max_node () from /usr/lib64/libnuma.so.1 >>> #6 0x00002aaaac241d13 in numa_init () from /usr/lib64/libnuma.so.1 >>> #7 0x00002aaaaaab845b in call_init () from /lib64/ld-linux-x86-64.so.2 >>> #8 0x00002aaaaaab8565 in _dl_init_internal () from >>> /lib64/ld-linux-x86-64.so.2 >>> #9 0x00002aaaaaaabaaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 >>> #10 0x0000000000000001 in ?? () >>> #11 0x00007fffffffe03c in ?? () >>> #12 0x0000000000000000 in ?? () >>> >>> I got the same thing whether I excluded openib or not. I then ran on my >>> Linux cluster, which doesn't have IB at all - and it ran fine. Also runs >>> clean on the Mac. However, in both those cases, I had left IO romio enabled. >>> >>> Now on odin, I always disable-io-romio. So I tried deliberately enabling >>> it, and everything works. So this appears to be something that the IO work >>> has broken. >>> >>> Edgar: can you please fix --disable-io-romio? >>> >>> Thanks >>> Ralph >>> >>> >>> >>> >>> On Oct 29, 2012, at 11:55 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote: >>> >>>> I'm sorry to add one more thing to the list, but beyond this file, it >>>> looks like also the entire ompi/mca/common/verbs/ directory is also >>>> missing in the 1.7 branch, but is required to compile the bcoll >>>> framework. It is there in the trunk, but missing in the 1.7 branch... >>>> >>>> Thanks >>>> Edgar >>>> >>>> >>>> On 10/26/2012 5:31 PM, Ralph Castain wrote: >>>>> Okay, I'll fix for tonights tarball. >>>>> >>>>> Thanks! >>>>> >>>>> On Oct 26, 2012, at 3:28 PM, "Shamis, Pavel" <sham...@ornl.gov> wrote: >>>>> >>>>>> There is a bug in makefile. The file existing in svn, but it is not >>>>>> listed in the Makefile.am. As a result, it wasn't pulled to the tarball. >>>>>> >>>>>> Pavel (Pasha) Shamis >>>>>> --- >>>>>> Computer Science Research Group >>>>>> Computer Science and Math Division >>>>>> Oak Ridge National Laboratory >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Oct 26, 2012, at 2:33 PM, Edgar Gabriel wrote: >>>>>> >>>>>> we have trouble compiling the 1.7 series on a machine in Dresden. >>>>>> Specifically, we receive an error message when compiling the >>>>>> bcol/iboffload component (other infiniband components compile fine). >>>>>> >>>>>> Any idea/suggestions what we might be doing wrong or what to look for? >>>>>> >>>>>> make[2]: Entering directory >>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload' >>>>>> CC bcol_iboffload_module.lo >>>>>> CC bcol_iboffload_mca.lo >>>>>> CC bcol_iboffload_endpoint.lo >>>>>> CC bcol_iboffload_frag.lo >>>>>> In file included from bcol_iboffload_frag.c:16:0: >>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such >>>>>> file or directory >>>>>> compilation terminated. >>>>>> make[2]: *** [bcol_iboffload_frag.lo] Error 1 >>>>>> make[2]: *** Waiting for unfinished jobs.... >>>>>> In file included from bcol_iboffload_mca.c:18:0: >>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such >>>>>> file or directory >>>>>> compilation terminated. >>>>>> make[2]: *** [bcol_iboffload_mca.lo] Error 1 >>>>>> In file included from bcol_iboffload_endpoint.c:23:0: >>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such >>>>>> file or directory >>>>>> compilation terminated. >>>>>> make[2]: *** [bcol_iboffload_endpoint.lo] Error 1 >>>>>> In file included from bcol_iboffload_module.c:39:0: >>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such >>>>>> file or directory >>>>>> compilation terminated. >>>>>> make[2]: *** [bcol_iboffload_module.lo] Error 1 >>>>>> make[2]: Leaving directory >>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload' >>>>>> make[1]: *** [all-recursive] Error 1 >>>>>> make[1]: Leaving directory `/home/h2/gabriel/openmpi-1.7rc4/ompi' >>>>>> make: *** [all-recursive] Error 1 >>>>>> >>>>>> Thanks >>>>>> Edgar >>>>>> >>>>>> -- >>>>>> Edgar Gabriel >>>>>> Associate Professor >>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>>>> Department of Computer Science University of Houston >>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>>>> >>>>>> <signature.asc>_______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org<mailto:de...@open-mpi.org> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>> -- >>>> Edgar Gabriel >>>> Associate Professor >>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>> Department of Computer Science University of Houston >>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> -- >> Edgar Gabriel >> Associate Professor >> Parallel Software Technologies Lab http://pstl.cs.uh.edu >> Department of Computer Science University of Houston >> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Edgar Gabriel Associate Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
signature.asc
Description: OpenPGP digital signature