Hi -
... bringing up this old thread...
I've been working with an infiniband cluster using ibv and mxm
GASNet conduits. I've observed the following:
- MXM does not work with CHPL_GASNET_SEGMENT=large
- IBV and MXM do not work with CHPL_GASNET_SEGMENT=fast
unless I disable PSHM (although probably switching it to
use SYSV shared memory instead of POSIX would work too,
but then I have kernel configuration parameters to worry about).
- Neither work on my system unless I use the SSH spawner
instead of the default MPI spawner; I have to set
export GASNET_IBV_SPAWNER=ssh
export GASNET_MXM_SPAWNER=ssh
What works (at all):
-IBV/MXM with --disable-pshm and CHPL_GASNET_SEGMENT=fast
- IBV with CHPL_GASNET_SEGMENT=large with our without PSHM
Note: I havn't reproduced the errors Rafael was getting with
jobs using a lot of memory.
Besides sharing my experience here - I have a question.
Why do we build Chapel+GASNet with --enable-pshm at all?
Since Chapel uses threads to access multiple cores/sockets
on a machine, we don't need GASNet's shared-memory
communication scheme for different processes on the same node.
I'll note that it's already disabledfor a few Cray configurations...
You might bring up testing. It seems to me that during
testing where many GASNet processes are run on a single node,
using the normal network interface is more similar to
a real multi-locale run anyway, and so our testing would
be more faithful if we did not use PSHM. But if that
doesn't convince you, we could just turn off PSHM for
the IBV/MXM conduits and leave it on for UDP...
Thoughts?
-michael
On 02/10/2014 03:46 PM, Rafael Larrosa Jiménez wrote:
> Hi,
>
>> Great, thanks for checking!
>>
>> Based on that, my thought is that for ibv we should use segment fast
>> (rationale: it's faster; it's more similar to what we do elsewhere and we
>> test it more) and use the configure flags below to make it work. Rafael
>> (or anyone), do you see any reasons not to do this?
>
> One of the reasons why I took some time to answer was that I wanted to
> make some benchmarks before answering, but I couldn't.
>
> Now I have made some benchmarks and have found that there are problems when
> using fast and a lot of memory, the exact error is:
>
> @ 0> snd status=5 opcode=0 dst_node=1 dst_qp=0
> @ 0> - rcv CQ contains impossibly large WCE count with status 5
> *** FATAL ERROR: aborting on reap of failed send
> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment
> to generate a backtrace.
> @ 0> snd status=10 opcode=2 dst_node=1 dst_qp=0
> @ 0> - rcv CQ contains impossibly large WCE count with status 10
> *** FATAL ERROR: aborting on reap of failed send
> *** Caught a fatal signal: SIGABRT(6) on node 0/2
> *** Caught a fatal signal: SIGABRT(6) on node 0/2
> @ 1> rcv comp->status=5
> @ 1> - snd CQ contains impossibly large WCE count with status 5
> *** FATAL ERROR: aborting on reap of failed recv
> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment
> to generate a backtrace.
> *** Caught a fatal signal: SIGABRT(6) on node 1/2
>
>
> That is using only two nodes with 64 GB of RAM each and an array of 16 GBytes
> of RAM (8 GB in each node), with smaller problems it works fine, but the speed
> is almost the same using segments large or fast (less than 1% in my tests),
> althought almost all the time was due to overload, not in movements of data.
> When using a bigger size of data speed was much higher, but only large could
> be used, as fast fails.
>
>
>> If not, Rafael, would this be a fairly simple patch for you to write,
>> test, and submit? (we don't have easy access to an ibv cluster, which is
>> part of the reason we haven't run into this ourselves). I think it should
>> be as simple as adding a case to util/chplenv/commSegment and then adding
>> some logic to third-party/gasnet/Makefile that adds the configuration
>> options based on CHPL_MAKE_COMM_SUBSTRATE being set to ibv. (?)
>
> IMO large should be used, and the only change should be to define
> CHPL_GASNET_SEGMENT as large by default when using ibv.
>
> That can be done by modifiying util/chplenv/commSegment, and changing:
>
> if (($substrate eq "portals") || ($substrate eq "gemini") || ($substrate
> eq "aries")) {
> $segment = "fast";
> } else {
> $segment = "everything";
> }
>
> To:
>
> if (($substrate eq "portals") || ($substrate eq "gemini") || ($substrate
> eq "aries")) {
> $segment = "fast";
> } elsif ($substrate eq "ibv") {
> $segment= "large";
> } else {
> $segment = "everything";
> }
>
> I can do that and commit it if you want.
>
> Greets,
>
> Rafael
>
>> Thanks,
>> -Brad
>>
>> On Fri, 7 Feb 2014, Rafael Larrosa Jiménez wrote:
>>> Hi,
>>>
>>>> Hi Rafael --
>>>>
>>>> Here's another tip from the GASNet team. Would you be able to give this
>>>> a
>>>> try and see if it helps with the ibv + fast?
>>>>
>>>> ----
>>>>
>>>> The Bad Address sounds like an interaction between IBV and POSIX Shared
>>>> Memory:
>>>> https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=2929
>>>>
>>>> The work-around is to configure GASNet with "--enable-pshm-sysv
>>>> --disable-pshm-posix".
>>>>
>>>> ----
>>>
>>> Yes, it works fine with those changes, sorry for the delay.
>>>
>>> Greets,
>>>
>>> Rafael
>>>
>>>> Thanks,
>>>> -Brad
>>>>
>>>> On Thu, 30 Jan 2014, Brad Chamberlain wrote:
>>>>> Hi Rafael --
>>>>>
>>>>> Thanks for your analysis and description.
>>>>>
>>>>> I don't have any problem with changing the default segment for 'ibv' and
>>>>> agree that it sounds like the right thing to do if it works around this
>>>>> issue. It seems like 'fast' would be preferable to 'large' from a
>>>>> performance perspective (based on a quick glance at the docs -- I'm
>>>>> mostly
>>>>> unfamiliar with 'large').
>>>>>
>>>>> In trying to use the 'fast' segment, do you know whether the memory
>>>>> allocator used changed to either tcmalloc or dlmalloc (either
>>>>> automatically or manually)? That is probably necessary for correctness,
>>>>> but should've (hopefully) happened automatically. printchplenv can be
>>>>> used to verify.
>>>>>
>>>>> Note that I believe you should only need to set CHPL_GASNET_SEGMENT. The
>>>>> CHPL_MAKE_ variables should follow automatically (and are not intended
>>>>> to
>>>>> be set by the end-user).
>>>>>
>>>>> Thanks again,
>>>>> -Brad
>>>>>
>>>>> On Thu, 30 Jan 2014, Rafael Larrosa Jiménez wrote:
>>>>>> El Jueves, 30 de enero de 2014 12:21:11 Akihiro Hayashi escribió:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for your helping me to understand the problem.
>>>>>>> I think now I understand the problem correctly.
>>>>>>>
>>>>>>> Rafael Asenjo and Rafael Larrosa.
>>>>>>> I'm pretty sure my assignment have involved a contiguous block and
>>>>>>> useBulkTransfer is active and useBulkTransferStride is not active in
>>>>>>> my
>>>>>>> chapel compiler. I just wanted to make both sure my code
>>>>>>> (useBulkTransfer)
>>>>>>> and Rafaerl Larrosa's code (useBulkTransferStride) can fail due to the
>>>>>>> ibv-conduit bug
>>>>>>> (https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495).
>>>>>>
>>>>>> First, tell that I'm not the developer, but a client of that code.
>>>>>>
>>>>>> Second, I have been writting this email for several hours, as I changed
>>>>>> my
>>>>>> vision on the problem, have been reading the ib verbs manuals, ib code,
>>>>>> etc.
>>>>>>
>>>>>> At the end I have found the solution, by default chapel puts all memory
>>>>>> as
>>>>>> memory accesible by RDMA, which seems to be the problem, as it can be
>>>>>> "unpinned, if instead only "large" portions are used for RDMA then it
>>>>>> works
>>>>>> fine, as explained in one of the messages:
>>>>>>
>>>>>> ---
>>>>>> In a GASNET_SEGMENT_FAST or _LARGE configuration the segment is
>>>>>> obtained
>>>>>> at
>>>>>> startup via mmap() and is never unmapped. So, sending from inside the
>>>>>> GASNet
>>>>>> segment in these two cases will ensure this bug cannot occur.
>>>>>> ---
>>>>>>
>>>>>> So if you define :
>>>>>>
>>>>>> export CHPL_MAKE_GASNET_SEGMENT=large
>>>>>> export CHPL_MAKE_COMM_SEGMENT=large
>>>>>> export CHPL_GASNET_SEGMENT=large
>>>>>>
>>>>>> Then do a make clobber followed by a make and recompile your program,
>>>>>> it
>>>>>> should work fine.
>>>>>>
>>>>>> BTW, when using portals, gemini or aries, the fast segment is used, as
>>>>>> explained in :
>>>>>>
>>>>>> doc/release/README.multilocale
>>>>>> ---
>>>>>> 3) Advanced GASNet users can set CHPL_GASNET_SEGMENT to choose a
>>>>>>
>>>>>> memory segment to use with GASNet. Current defaults are:
>>>>>> When CHPL_COMM_SUBSTRATE is... Chapel will choose...
>>>>>>
>>>>>> portals fast
>>>>>> gemini fast
>>>>>> (other) everything
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> But using fast with ibv gives this error:
>>>>>> *** FATAL ERROR: Unexpected error Bad address (rc=1 errno=14) when
>>>>>> registering
>>>>>> the segment
>>>>>>
>>>>>> But that is not a problem as large segments are the solution, hope they
>>>>>> don't
>>>>>> break other things :-)
>>>>>>
>>>>>> I have tested it with Akihiro code, and it seems to work fine now.
>>>>>>
>>>>>> Perphaps is a good idea to change the default to use large segments
>>>>>> with
>>>>>> ibv.
>>>>>>
>>>>>> Greets,
>>>>>>
>>>>>> Rafael
>>>
>>> EMAIL: [email protected] Edificio de Bioinnovación
>>> TELEF: + 34951952788 C/ Severo Ochoa 34
>>> FAX : +34951952792 Parque Tecnológico de Andalucía
>>>
>>> 29590 Málaga (SPAIN)
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers