It looks like the -sparallelInitElts=false
setting restored our performance to what we got with 1.11. Surprisingly (to me), without this flag, the code seems to be using multicore execution /of the loops I've written/, as well as perhaps the array initialization (at least, that's what it looks like when I watch htop as I run it, and all 4 cores run at 100% right up until it quits 12 seconds after starting). That would probably explain the terrible performance, as the untiled loop nest would probably cause terrible contention for cache lines if run concurrently.
I'm attaching our code, but my question may now just be: how to I prevent concurrent execution (and the answer may be, with the flag above).
Dave W On 07/12/2016 12:49 PM, Brad Chamberlain wrote:
One other possibility that occurred to me groggily this morning is that in version 1.12.0 we made the --fast flag no longer throw --no-ieee-float by default (a flag which permits the back-end compiler to use relaxed IEEE floating point semantics). Specifically, we decided that --fast shouldn't result in potentially surprising semantic changes like this and that the user should have to request it explicitly.To verify whether this is the cause or not, you ought to be able to compile a version 1.11 program with --ieee-float *after* your --fast flag or a version 1.13 program with --no-ieee-float in addition to your --fast flag to see if that reduced the performance gap you're seeing.-Brad On Mon, 11 Jul 2016, Elliot Ronaghan wrote:Hi Dave,Chapel's performance has significantly improved over the last few releases, so it's surprising that you would see a 2X slowdown after upgrading. Withoutseeing the code, my initial guess is that you might be getting bad first-touch for some arrays now. Prior to 1.11 we serially initializedarrays, but we switched to parallel initialization by default in 1.12. If you're using a machine with multiple numa domains (it sounds like you are since you have multiple cpus) and your code is serial, this could cause aslowdown.You can check if parallel array initialization is causing the slowdown bycompiling with `-sparallelInitElts=false` and seeing if you get your old performance back. That's a big hammer, but it will at least tell us if that's the cause of the performance loss. Feel free to send your code along, and note that there's no problem with installing multiple versions of Chapel. ElliotWe've upgraded from Chapel 1.11 to 1.13 recently, and we're seeing adrop-off in performance (by about a factor of two) of our Chapel code for Nussinov's Algorithm for RNA sequence alignment. Has anyone else noticed a difference? Does anyone have easy access to both 1.11 and 1.13 and time toverify this result if I send the specific code? We could perhaps try toinstall two versions at once, but if this is a known problem, I don't wantto bother, and if it is new, I thought this might be easier for the developers to confirm, and interesting if it is indeed a change.Note that we also did a hardware upgrade recently, and it is possible thatthe old numbers come from the old hardware, but I wouldn't expect anhardware upgrade to make things slower ... in case anyone cares, we went from first-generation i7's (i7-860, I think?) to new i5-6500's, and this isa single-threaded code. Thanks for any insight anyone can provide, Dave Wonnacott------------------------------------------------------------------------------Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape _______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
autoparallel_slower.tgz
Description: application/compressed-tar
------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev
_______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
