Hmmm...well, it works fine as long as the procs are on the same node. However, if they are on different nodes, it segfaults:
[rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on bend001 running on bend002 [bend001:06590] *** Process received signal *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] Signal code: Address not mapped (1) [bend001:06590] Failing at address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() [0x307d40f500] [bend001:06590] *** End of error message *** [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) -------------------------------------------------------------------------- shmemrun noticed that process rank 0 with PID 6590 on node bend001 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- I would have thought it should work in that situation - yes? On Aug 14, 2013, at 2:52 PM, Joshua Ladd <josh...@mellanox.com> wrote: > The following simple test code will exercise the following: > > start_pes() > > shmalloc() > > shmem_int_get() > > shmem_int_put() > > shmem_barrier_all() > > To compile: > > shmemcc test_shmem.c -o test_shmem > > To launch: > > shmemrun -np 2 test_shmem > > or for those who prefer to launch with SLURM > > srun -n 2 test_shmem > > Josh > > > -----Original Message----- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, August 14, 2013 5:32 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 > > Can you point me to a test program that would exercise it? I'd like to give > it a try first. > > I'm okay with on by default as it builds its own separate library, and with > the RFC > > On Aug 14, 2013, at 2:03 PM, "Barrett, Brian W" <bwba...@sandia.gov> wrote: > >> Josh - >> >> In general, I don't have a strong opinion of whether OpenSHMEM is on >> by default or not. It might cause unexpected behavior for some users >> (like on Crays, where one should really use Cray's SHMEM), but maybe >> it's better on other platforms. >> >> I also would have no objection to the RFC, provided the segfaults I >> found get resolved. >> >> Brian >> >> On 8/14/13 2:08 PM, "Joshua Ladd" <josh...@mellanox.com> wrote: >> >>> Ralph, and Brian >>> >>> Thanks a bunch for taking the time to review this. It is extremely >>> helpful. Let me comment of the building of OSHMEM and solicit some >>> feedback from you guys (along with the rest of the community.) >>> Originally we had planned to enable OSHMEM to build only if >>> '--with-oshmem' flag was passed at configure time. However, >>> (unbeknownst to me) this behavior was changed and now OSHMEM is built by >>> default, i.e. >>> yes, Ralph this is the intended behavior now. I am wondering if this >>> is such a good idea. Do folks have a strong opinion on this one way >>> or the other? From my perspective I can see arguments for both sides >>> of the coin. >>> >>> Other than cleaning up warnings and resolving the segfault that Brian >>> observed are we on a good course to getting this upstream? Is it >>> reasonable to file an RFC for three weeks out? >>> >>> Josh >>> >>> -----Original Message----- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Barrett, >>> Brian W >>> Sent: Sunday, August 11, 2013 1:42 PM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 >>> >>> Ralph - >>> >>> I think those warnings are just because of when they last synced with >>> the trunk; it looks like they haven't updated in the last week, when >>> those (and some usnic fixes) went in. >>> >>> More concerning is the --enable-picky stuff and the disabling of >>> SHMEM in the right places. >>> >>> Brian >>> >>> On 8/11/13 11:24 AM, "Ralph Castain" <r...@open-mpi.org> wrote: >>> >>>> Turning off the enable_picky, I get it to compile with the following >>>> warnings: >>>> >>>> pget_elements_x_f.c:70: warning: no previous prototype for >>>> 'ompi_get_elements_x_f' >>>> pstatus_set_elements_x_f.c:70: warning: no previous prototype for >>>> 'ompi_status_set_elements_x_f' >>>> ptype_get_extent_x_f.c:69: warning: no previous prototype for >>>> 'ompi_type_get_extent_x_f' >>>> ptype_get_true_extent_x_f.c:69: warning: no previous prototype for >>>> 'ompi_type_get_true_extent_x_f' >>>> ptype_size_x_f.c:69: warning: no previous prototype for >>>> 'ompi_type_size_x_f' >>>> >>>> I also found that OpenShmem is still building by default. Is that >>>> intended? I thought you were only going to build if --with-shmem (or >>>> whatever option) was given. >>>> >>>> Looks like some cleanup is required >>>> >>>> On Aug 10, 2013, at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> FWIW, I couldn't get it to build - this is on a simple Xeon-based >>>>> system under CentOS 6.2: >>>>> >>>>> cc1: warnings being treated as errors >>>>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion': >>>>> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1 >>>>> of 'opal_atomic_add_32' differ in signedness >>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: >>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *' >>>>> spml_yoda_getreq.c:98: error: signed and unsigned type in >>>>> conditional expression >>>>> cc1: warnings being treated as errors >>>>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion': >>>>> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1 >>>>> of 'opal_atomic_add_32' differ in signedness >>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: >>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *' >>>>> spml_yoda_putreq.c:81: error: signed and unsigned type in >>>>> conditional expression >>>>> make[2]: *** [spml_yoda_getreq.lo] Error 1 >>>>> make[2]: *** Waiting for unfinished jobs.... >>>>> make[2]: *** [spml_yoda_putreq.lo] Error 1 >>>>> cc1: warnings being treated as errors >>>>> spml_yoda.c: In function 'mca_spml_yoda_put_internal': >>>>> spml_yoda.c:725: error: pointer targets in passing argument 1 of >>>>> 'opal_atomic_add_32' differ in signedness >>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: >>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *' >>>>> spml_yoda.c:725: error: signed and unsigned type in conditional >>>>> expression >>>>> spml_yoda.c: In function 'mca_spml_yoda_get': >>>>> spml_yoda.c:1107: error: pointer targets in passing argument 1 of >>>>> 'opal_atomic_add_32' differ in signedness >>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: >>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *' >>>>> spml_yoda.c:1107: error: signed and unsigned type in conditional >>>>> expression >>>>> make[2]: *** [spml_yoda.lo] Error 1 >>>>> make[1]: *** [all-recursive] Error 1 >>>>> >>>>> Only configure arguments: >>>>> >>>>> enable_picky=yes >>>>> enable_debug=yes >>>>> >>>>> >>>>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3) >>>>> >>>>> >>>>> >>>>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W" >>>>> <bwba...@sandia.gov> >>>>> wrote: >>>>> >>>>>> On 8/6/13 10:30 AM, "Joshua Ladd" <josh...@mellanox.com> wrote: >>>>>> >>>>>>> Dear OMPI Community, >>>>>>> >>>>>>> Please find on Bitbucket the latest round of OSHMEM changes based >>>>>>> on community feedback. Please git and test at your leisure. >>>>>>> >>>>>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git >>>>>> >>>>>> Josh - >>>>>> >>>>>> In general, I think everything looks ok. However, the "right" >>>>>> thing doesn't happen if the CM PML is used (at least, when using >>>>>> the Portals >>>>>> 4 >>>>>> MTL). When configured with: >>>>>> >>>>>> ./configure >>>>>> --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool >>>>>> >>>>>> The build segfaults trying to run a SHMEM program: >>>>>> >>>>>> mpirun -np 2 ./bcast >>>>>> [shannon:90397] *** Process received signal *** [shannon:90397] >>>>>> Signal: Segmentation fault (11) [shannon:90397] Signal code: >>>>>> Address not mapped (1) [shannon:90397] Failing at address: (nil) >>>>>> [shannon:90398] *** Process received signal *** [shannon:90398] >>>>>> Signal: Segmentation fault (11) [shannon:90398] Signal code: >>>>>> Address not mapped (1) [shannon:90398] Failing at address: (nil) >>>>>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0] >>>>>> [shannon:90397] *** End of error message *** [shannon:90398] [ 0] >>>>>> /lib64/libpthread.so.0() [0x38b7a0f4a0] [shannon:90398] *** End of >>>>>> error message *** >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> --- >>>>>> --- >>>>>> -- >>>>>> mpirun noticed that process rank 1 with PID 90398 on node shannon >>>>>> exited on signal 11 (Segmentation fault). >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> --- >>>>>> --- >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> Brian >>>>>> >>>>>> -- >>>>>> Brian W. Barrett >>>>>> Scalable System Software Group >>>>>> Sandia National Laboratories >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> >>> -- >>> Brian W. Barrett >>> Scalable System Software Group >>> Sandia National Laboratories >>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> -- >> Brian W. Barrett >> Scalable System Software Group >> Sandia National Laboratories >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > <test_shmem.c>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel