One more note:

> The strided code allocates and frees buffers for pack/unpack which can
> trigger the bug.
> 
> One additional work around is to disable the problematic pack/unpack
> implementation by setting env var GASNET_VIS_AMPIPE=0

-Brad


On Wed, 29 Jan 2014, Brad Chamberlain wrote:

>
> Hi Rafael and Akihiro --
>
> I should've been a bit more patient in my response.  After passing along the 
> "strided" bit, the response came back:
>
>> The implementation of strided operations is KNOWN to trigger the bug I 
>> referenced. So, have a look at the work-arounds in the bug report.
>
> which is here:
>
> https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495
>
> -Brad
>
>
> On Wed, 29 Jan 2014, Brad Chamberlain wrote:
>
>> 
>> Hi Rafael and Akihiro --
>> 
>> It sounds as though there isn't a known issue w.r.t. large ibv conduit 
>> messages (I didn't catch the importance of 'strided', so sent that along in 
>> a second message, but don't expect it'll change the response).  They
>> wrote:
>> 
>>> We don't have a known error with long messages, but do have one with 
>>> respect to free() which might be the problem.  If we (ibv-conduit via 
>>> firehose) cache a dynamic memory registration (especially problematic w/ 
>>> SEGENT_EVERYTHING), then it is possible that memory is free()d and a later 
>>> malloc() gets the same virtual address.  If that happens then ibv may end 
>>> up performing RDMA from the physical pages corresponding to the PREVIOUS 
>>> association for the virtual address (NOTE: the pages are ref-counted and 
>>> thus NOT truly free and NOT mapped into some other process). See 
>>> https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495 for some details on 
>>> that bug. The work-arounds are to disable mmap-based malloc(), or disable 
>>> firehose.
>> 
>> To me, this doesn't sound like the same thing, but I'm far enough away from 
>> the problem that you may recognize something that I'm not.
>> 
>> Assuming it doesn't, how hard would it be to put together a small C+GASnet 
>> test that exhibits this issue?  Would it be as simple as sending a large 
>> buffer in strided mode?  In a loop?
>> 
>> As long as I was bothering them, I also asked whether there was a way to 
>> sanity check that an executable was built with debugging on (since there 
>> are so many ways that we could get this wrong) and got the response:
>> 
>>> As for checking for debugging support, the preprocessor token 
>>> GASNET_CONFIG_STRING will tell you a lot about the configuration you've 
>>> compiled with.  We do some name-shifting to ensure you can't link with a 
>>> library of a different configuration that you've compiled with.
>>> 
>>> Alternatively if you have the "ident" utility for finding RCS strings, 
>>> applying it to the executable file will extract lots of configuration 
>>> bits. The value of GASNET_CONFIG_STRING will follow "$GASNetConfig:" You 
>>> can fake that with:
>>>   $ perl -n -ln044 -e 'print if /GASNetConfig:/' -- a.out
>> 
>> Thanks,
>> -Brad
>> 
>> 
>

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to