ok, I'll look into this. I noticed a problem with static builds on
lustre file systems recently, and I was wandering whether its the same
issue or not. But I'll check what's going on.

THanks
Edgar

On 10/30/2012 7:22 AM, Ralph Castain wrote:
> No to Lustre, and I didn't build static
> 
> I'm not sure what, if any, parallel file system might be present. In the case 
> that works, I just built with no configure args other than prefix. ompi_info 
> shows both romio and mpio built, but nothing more about what support they 
> built internally.
> 
> 
> On Oct 30, 2012, at 4:14 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
> 
>> Ralph,
>>
>> just out curiosity: is there a lustre file system on the machine and is
>> this a static build ?
>>
>> Thanks
>> Edgar
>>
>> On 10/29/2012 9:17 PM, Ralph Castain wrote:
>>> Hmmm...I added that directory and tried this on odin (which is an IB-based 
>>> machine). Any MPI proc segfaults:
>>>
>>> Core was generated by `./hello'.
>>> Program terminated with signal 11, Segmentation fault.
>>> w#0  _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>> 574 src/inode.c: No such file or directory.
>>>     in src/inode.c
>>> (gdb) where
>>> #0  _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>> #1  0x00002aaaabd3f3e9 in _sysio_path_walk (parent=0x0, nd=0x7fffffffd8e0) 
>>> at src/namei.c:216
>>> #2  0x00002aaaabd3faad in _sysio_namei (parent=0x0, path=<value optimized 
>>> out>, flags=0, intnt=0x7fffffffd950, pnop=0x7fffffffd970) at src/namei.c:505
>>> #3  0x00002aaaabd3fd98 in open (path=0x2aaaac24280f 
>>> "/sys/devices/system/node", flags=<value optimized out>) at src/open.c:179
>>> #4  0x00002aaaabd43d5b in opendir (name=0x2aaaac24280f 
>>> "/sys/devices/system/node") at src/stddir.c:60
>>> #5  0x00002aaaac241825 in numa_max_node () from /usr/lib64/libnuma.so.1
>>> #6  0x00002aaaac241d13 in numa_init () from /usr/lib64/libnuma.so.1
>>> #7  0x00002aaaaaab845b in call_init () from /lib64/ld-linux-x86-64.so.2
>>> #8  0x00002aaaaaab8565 in _dl_init_internal () from 
>>> /lib64/ld-linux-x86-64.so.2
>>> #9  0x00002aaaaaaabaaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
>>> #10 0x0000000000000001 in ?? ()
>>> #11 0x00007fffffffe03c in ?? ()
>>> #12 0x0000000000000000 in ?? ()
>>>
>>> I got the same thing whether I excluded openib or not. I then ran on my 
>>> Linux cluster, which doesn't have IB at all - and it ran fine. Also runs 
>>> clean on the Mac. However, in both those cases, I had left IO romio enabled.
>>>
>>> Now on odin, I always disable-io-romio. So I tried deliberately enabling 
>>> it, and everything works. So this appears to be something that the IO work 
>>> has broken.
>>>
>>> Edgar: can you please fix --disable-io-romio?
>>>
>>> Thanks
>>> Ralph
>>>
>>>
>>>
>>>
>>> On Oct 29, 2012, at 11:55 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
>>>
>>>> I'm sorry to add one more thing to the list, but beyond this file, it
>>>> looks like also the entire ompi/mca/common/verbs/ directory is also
>>>> missing in the 1.7 branch, but is required to compile the bcoll
>>>> framework.  It is there in the trunk, but missing in the 1.7 branch...
>>>>
>>>> Thanks
>>>> Edgar
>>>>
>>>>
>>>> On 10/26/2012 5:31 PM, Ralph Castain wrote:
>>>>> Okay, I'll fix for tonights tarball.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Oct 26, 2012, at 3:28 PM, "Shamis, Pavel" <sham...@ornl.gov> wrote:
>>>>>
>>>>>> There is a bug in makefile. The file existing in svn, but it is not 
>>>>>> listed in the Makefile.am. As a result, it wasn't pulled to the tarball.
>>>>>>
>>>>>> Pavel (Pasha) Shamis
>>>>>> ---
>>>>>> Computer Science Research Group
>>>>>> Computer Science and Math Division
>>>>>> Oak Ridge National Laboratory
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 26, 2012, at 2:33 PM, Edgar Gabriel wrote:
>>>>>>
>>>>>> we have trouble compiling the 1.7 series on a machine in Dresden.
>>>>>> Specifically, we receive an error message when compiling the
>>>>>> bcol/iboffload component (other infiniband components compile fine).
>>>>>>
>>>>>> Any idea/suggestions what we might be doing wrong or what to look for?
>>>>>>
>>>>>> make[2]: Entering directory
>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>> CC       bcol_iboffload_module.lo
>>>>>> CC       bcol_iboffload_mca.lo
>>>>>> CC       bcol_iboffload_endpoint.lo
>>>>>> CC       bcol_iboffload_frag.lo
>>>>>> In file included from bcol_iboffload_frag.c:16:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_frag.lo] Error 1
>>>>>> make[2]: *** Waiting for unfinished jobs....
>>>>>> In file included from bcol_iboffload_mca.c:18:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_mca.lo] Error 1
>>>>>> In file included from bcol_iboffload_endpoint.c:23:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_endpoint.lo] Error 1
>>>>>> In file included from bcol_iboffload_module.c:39:0:
>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No such
>>>>>> file or directory
>>>>>> compilation terminated.
>>>>>> make[2]: *** [bcol_iboffload_module.lo] Error 1
>>>>>> make[2]: Leaving directory
>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>> make[1]: *** [all-recursive] Error 1
>>>>>> make[1]: Leaving directory `/home/h2/gabriel/openmpi-1.7rc4/ompi'
>>>>>> make: *** [all-recursive] Error 1
>>>>>>
>>>>>> Thanks
>>>>>> Edgar
>>>>>>
>>>>>> --
>>>>>> Edgar Gabriel
>>>>>> Associate Professor
>>>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>>>> Department of Computer Science          University of Houston
>>>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>>>>
>>>>>> <signature.asc>_______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org<mailto:de...@open-mpi.org>
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> -- 
>>>> Edgar Gabriel
>>>> Associate Professor
>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>> Department of Computer Science          University of Houston
>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> -- 
>> Edgar Gabriel
>> Associate Professor
>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>> Department of Computer Science          University of Houston
>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to