Re: Performance drop (single-threaded) on integer-intensive loop nests from Chapel 1.11 to 1.13?

Elliot Ronaghan Wed, 13 Jul 2016 09:57:07 -0700

Does your machine have multiple numa domains (more than 1 CPU)? If so, then
parallel initialization could be causing you to have bad memory affinity for
your serial for-loops.


Switching to parallel initialization resulted in a ~2X speedup for our
stream benchmark because it meant our memory first-touch now matched how
subsequent parallel loops were accessing memory. You could be seeing the
opposite, where serial first-touch is really what you want, but parallel
initialization is resulting in bad affinity for your for-loops.

http://chapel.cray.com/releaseNotes/1.12/05-Optimizations.pdf (slides 15-20)
has some more info on our decision to switch to parallel array
initialization by default, and the performance impact it had on stream.

To minimize mailing list noise, feel free to send the video off-list and 
once we figure out what's going on we can send a summary for those who might
be interested.

Elliot


>I actually don't think there's any problem with parallel initialization. It
>could well be happening, but shouldn't be causing a dramatic slow down.
>
>I would like to send you a short video clip tomorrow to illustrate what I am
>seeing on my machine. It is possible there is something machine dependent? We
>are using a new Skylake machine.
>
>-- Dave W (sent from my phone, so please excuse brevity, speak-o's, and
>swype-o's)
>
>On July 12, 2016 6:41:37 PM EDT, Elliot Ronaghan <[email protected]> wrote:
>
>
>>>It looks like the
>>>
>>>-sparallelInitElts=false
>>>
>>>setting restored our performance to what we got with 1.11. Surprisingly (to
>>>me), without this flag, the code seems to be using multicore execution of the
>>>loops I've written, as well as perhaps the array initialization (at least,
>>>that's what it looks like when I watch htop as I run it, and all 4 cores run 
>>>at
>>>100% right up until it quits 12 seconds after starting). That would probably
>>>explain the terrible performance, as the untiled loop nest would probably 
>>>cause
>>>terrible contention for cache lines if run concurrently.
>>
>>
>>
>>That flag only impacts array initialization. Your for-loops will still run
>>serially (Chapel, very intentionally, does not auto-parallelize anything.)
>>When I run, I see a spike for all cores during array >initialization, then
>>just one core busy for the rest of the program.
>>
>>
>>I'm attaching our code, but my question may now just be: how to I prevent
>>concurrent execution (and the answer may be, with the flag above).
>>
>>
>>
>>For now, I'd just use -sparallelInitElts=false. Currently there's no way to
>>squash the default array initialization, but we're working on that. In the
>>future you should be able to get serial array init by doing something like:
>>
>>    // replace default init with manual serial init
>>    var m: [MatrixD] int = for i in MatrixD do 0;


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: Performance drop (single-threaded) on integer-intensive loop nests from Chapel 1.11 to 1.13?

Reply via email to