Hmmm...well, it works fine as long as the procs are on the same node. However, 
if they are on different nodes, it segfaults:

[rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem
running on bend001
running on bend002
[bend001:06590] *** Process received signal ***
[bend001:06590] Signal: Segmentation fault (11)
[bend001:06590] Signal code: Address not mapped (1)
[bend001:06590] Failing at address: (nil)
[bend001:06590] [ 0] /lib64/libpthread.so.0() [0x307d40f500]
[bend001:06590] *** End of error message ***
[bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] 
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
shmemrun noticed that process rank 0 with PID 6590 on node bend001 exited on 
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I would have thought it should work in that situation - yes?


On Aug 14, 2013, at 2:52 PM, Joshua Ladd <josh...@mellanox.com> wrote:

> The following simple test code will exercise the following:
> 
> start_pes()
> 
> shmalloc()
> 
> shmem_int_get()  
> 
> shmem_int_put()
> 
> shmem_barrier_all()  
> 
> To compile:
> 
> shmemcc test_shmem.c -o test_shmem 
> 
> To launch:
> 
> shmemrun -np 2  test_shmem
> 
> or for those who prefer to launch with SLURM
> 
> srun -n 2 test_shmem
> 
> Josh
> 
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, August 14, 2013 5:32 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
> 
> Can you point me to a test program that would exercise it? I'd like to give 
> it a try first.
> 
> I'm okay with on by default as it builds its own separate library, and with 
> the RFC
> 
> On Aug 14, 2013, at 2:03 PM, "Barrett, Brian W" <bwba...@sandia.gov> wrote:
> 
>> Josh -
>> 
>> In general, I don't have a strong opinion of whether OpenSHMEM is on 
>> by default or not.  It might cause unexpected behavior for some users 
>> (like on Crays, where one should really use Cray's SHMEM), but maybe 
>> it's better on other platforms.
>> 
>> I also would have no objection to the RFC, provided the segfaults I 
>> found get resolved.
>> 
>> Brian
>> 
>> On 8/14/13 2:08 PM, "Joshua Ladd" <josh...@mellanox.com> wrote:
>> 
>>> Ralph, and Brian
>>> 
>>> Thanks a bunch for taking the time to review this. It is extremely 
>>> helpful. Let me comment of the building of OSHMEM and solicit some 
>>> feedback from you guys (along with the rest of the community.) 
>>> Originally we had planned to enable OSHMEM to build only if 
>>> '--with-oshmem' flag was passed at configure time. However, 
>>> (unbeknownst to me) this behavior was changed and now OSHMEM is built by 
>>> default, i.e.
>>> yes, Ralph this is the intended behavior now. I am wondering if this 
>>> is such a good idea. Do folks have a strong opinion on this one way 
>>> or the other? From my perspective I can see arguments for both sides 
>>> of the coin.
>>> 
>>> Other than cleaning up warnings and resolving the segfault that Brian 
>>> observed are we on a good course to getting this upstream? Is it 
>>> reasonable to file an RFC for three weeks out?
>>> 
>>> Josh
>>> 
>>> -----Original Message-----
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Barrett, 
>>> Brian W
>>> Sent: Sunday, August 11, 2013 1:42 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>>> 
>>> Ralph -
>>> 
>>> I think those warnings are just because of when they last synced with 
>>> the trunk; it looks like they haven't updated in the last week, when 
>>> those (and some usnic fixes) went in.
>>> 
>>> More concerning is the --enable-picky stuff and the disabling of 
>>> SHMEM in the right places.
>>> 
>>> Brian
>>> 
>>> On 8/11/13 11:24 AM, "Ralph Castain" <r...@open-mpi.org> wrote:
>>> 
>>>> Turning off the enable_picky, I get it to compile with the following
>>>> warnings:
>>>> 
>>>> pget_elements_x_f.c:70: warning: no previous prototype for 
>>>> 'ompi_get_elements_x_f'
>>>> pstatus_set_elements_x_f.c:70: warning: no previous prototype for 
>>>> 'ompi_status_set_elements_x_f'
>>>> ptype_get_extent_x_f.c:69: warning: no previous prototype for 
>>>> 'ompi_type_get_extent_x_f'
>>>> ptype_get_true_extent_x_f.c:69: warning: no previous prototype for 
>>>> 'ompi_type_get_true_extent_x_f'
>>>> ptype_size_x_f.c:69: warning: no previous prototype for 
>>>> 'ompi_type_size_x_f'
>>>> 
>>>> I also found that OpenShmem is still building by default. Is that 
>>>> intended? I thought you were only going to build if --with-shmem (or 
>>>> whatever option) was given.
>>>> 
>>>> Looks like some cleanup is required
>>>> 
>>>> On Aug 10, 2013, at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>>> FWIW, I couldn't get it to build - this is on a simple Xeon-based 
>>>>> system under CentOS 6.2:
>>>>> 
>>>>> cc1: warnings being treated as errors
>>>>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
>>>>> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1 
>>>>> of 'opal_atomic_add_32' differ in signedness
>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: 
>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>> spml_yoda_getreq.c:98: error: signed and unsigned type in 
>>>>> conditional expression
>>>>> cc1: warnings being treated as errors
>>>>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
>>>>> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1 
>>>>> of 'opal_atomic_add_32' differ in signedness
>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: 
>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>> spml_yoda_putreq.c:81: error: signed and unsigned type in 
>>>>> conditional expression
>>>>> make[2]: *** [spml_yoda_getreq.lo] Error 1
>>>>> make[2]: *** Waiting for unfinished jobs....
>>>>> make[2]: *** [spml_yoda_putreq.lo] Error 1
>>>>> cc1: warnings being treated as errors
>>>>> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
>>>>> spml_yoda.c:725: error: pointer targets in passing argument 1 of 
>>>>> 'opal_atomic_add_32' differ in signedness
>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: 
>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>> spml_yoda.c:725: error: signed and unsigned type in conditional 
>>>>> expression
>>>>> spml_yoda.c: In function 'mca_spml_yoda_get':
>>>>> spml_yoda.c:1107: error: pointer targets in passing argument 1 of 
>>>>> 'opal_atomic_add_32' differ in signedness
>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: 
>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>> spml_yoda.c:1107: error: signed and unsigned type in conditional 
>>>>> expression
>>>>> make[2]: *** [spml_yoda.lo] Error 1
>>>>> make[1]: *** [all-recursive] Error 1
>>>>> 
>>>>> Only configure arguments:
>>>>> 
>>>>> enable_picky=yes
>>>>> enable_debug=yes
>>>>> 
>>>>> 
>>>>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
>>>>> 
>>>>> 
>>>>> 
>>>>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W" 
>>>>> <bwba...@sandia.gov>
>>>>> wrote:
>>>>> 
>>>>>> On 8/6/13 10:30 AM, "Joshua Ladd" <josh...@mellanox.com> wrote:
>>>>>> 
>>>>>>> Dear OMPI Community,
>>>>>>> 
>>>>>>> Please find on Bitbucket the latest round of OSHMEM changes based 
>>>>>>> on community feedback. Please git and test at your leisure.
>>>>>>> 
>>>>>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git
>>>>>> 
>>>>>> Josh -
>>>>>> 
>>>>>> In general, I think everything looks ok.  However, the "right" 
>>>>>> thing doesn't happen if the CM PML is used (at least, when using 
>>>>>> the Portals
>>>>>> 4
>>>>>> MTL).  When configured with:
>>>>>> 
>>>>>> ./configure
>>>>>> --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>>>>>> 
>>>>>> The build segfaults trying to run a SHMEM program:
>>>>>> 
>>>>>> mpirun -np 2 ./bcast
>>>>>> [shannon:90397] *** Process received signal *** [shannon:90397]
>>>>>> Signal: Segmentation fault (11) [shannon:90397] Signal code: 
>>>>>> Address not mapped (1) [shannon:90397] Failing at address: (nil) 
>>>>>> [shannon:90398] *** Process received signal *** [shannon:90398]
>>>>>> Signal: Segmentation fault (11) [shannon:90398] Signal code: 
>>>>>> Address not mapped (1) [shannon:90398] Failing at address: (nil) 
>>>>>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0] 
>>>>>> [shannon:90397] *** End of error message *** [shannon:90398] [ 0]
>>>>>> /lib64/libpthread.so.0() [0x38b7a0f4a0] [shannon:90398] *** End of 
>>>>>> error message ***
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> ---
>>>>>> ---
>>>>>> --
>>>>>> mpirun noticed that process rank 1 with PID 90398 on node shannon 
>>>>>> exited  on signal 11 (Segmentation fault).
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> ---
>>>>>> ---
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Brian
>>>>>> 
>>>>>> --
>>>>>> Brian W. Barrett
>>>>>> Scalable System Software Group
>>>>>> Sandia National Laboratories
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> 
>>> --
>>> Brian W. Barrett
>>> Scalable System Software Group
>>> Sandia National Laboratories
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> <test_shmem.c>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to