It looks like the

-sparallelInitElts=false

setting restored our performance to what we got with 1.11. Surprisingly (to me), without this flag, the code seems to be using multicore execution /of the loops I've written/, as well as perhaps the array initialization (at least, that's what it looks like when I watch htop as I run it, and all 4 cores run at 100% right up until it quits 12 seconds after starting). That would probably explain the terrible performance, as the untiled loop nest would probably cause terrible contention for cache lines if run concurrently.

I'm attaching our code, but my question may now just be: how to I prevent concurrent execution (and the answer may be, with the flag above).

Dave W


On 07/12/2016 12:49 PM, Brad Chamberlain wrote:

One other possibility that occurred to me groggily this morning is that in version 1.12.0 we made the --fast flag no longer throw --no-ieee-float by default (a flag which permits the back-end compiler to use relaxed IEEE floating point semantics). Specifically, we decided that --fast shouldn't result in potentially surprising semantic changes like this and that the user should have to request it explicitly.

To verify whether this is the cause or not, you ought to be able to compile a version 1.11 program with --ieee-float *after* your --fast flag or a version 1.13 program with --no-ieee-float in addition to your --fast flag to see if that reduced the performance gap you're seeing.

-Brad


On Mon, 11 Jul 2016, Elliot Ronaghan wrote:

Hi Dave,

Chapel's performance has significantly improved over the last few releases, so it's surprising that you would see a 2X slowdown after upgrading. Without
seeing the code, my initial guess is that you might be getting bad
first-touch for some arrays now. Prior to 1.11 we serially initialized
arrays, but we switched to parallel initialization by default in 1.12. If you're using a machine with multiple numa domains (it sounds like you are since you have multiple cpus) and your code is serial, this could cause a
slowdown.

You can check if parallel array initialization is causing the slowdown by
compiling with `-sparallelInitElts=false` and seeing if you get your old
performance back. That's a big hammer, but it will at least tell us if
that's the cause of the performance loss.

Feel free to send your code along, and note that there's no problem with
installing multiple versions of Chapel.

Elliot

We've upgraded from Chapel 1.11 to 1.13 recently, and we're seeing a
drop-off in performance (by about a factor of two) of our Chapel code for Nussinov's Algorithm for RNA sequence alignment. Has anyone else noticed a difference? Does anyone have easy access to both 1.11 and 1.13 and time to
verify this result if I send the specific code? We could perhaps try to
install two versions at once, but if this is a known problem, I don't want
to bother, and if it is new, I thought this might be easier for the
developers to confirm, and interesting if it is indeed a change.

Note that we also did a hardware upgrade recently, and it is possible that
the old numbers come from the old hardware, but I wouldn't expect an
hardware upgrade to make things slower ... in case anyone cares, we went from first-generation i7's (i7-860, I think?) to new i5-6500's, and this is
a single-threaded code.

Thanks for any insight anyone can provide,
  Dave Wonnacott



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users


Attachment: autoparallel_slower.tgz
Description: application/compressed-tar

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to