Re: [OMPI devel] Heads up on new feature to 1.3.4

Jeff Squyres Mon, 17 Aug 2009 14:02:26 -0400

On Aug 17, 2009, at 12:11 PM, N.M. Maclaren wrote:

1) To have a mandatory configuration option setting the default,whichwould have a name like 'performance' for the binding option. YOUcould thenbeat up anyone who benchmarkets without it for being biassed. Thisis abetter solution, but the "I shouldn't need to have to think justbecause I
am doing something complicated" brigade would object.

Yes, BUT... We had a similar option to this for a long, long time.Marketing departments from other organizations / companies willfullyignored it whenever presenting competitive data. The 1,000,000th timeI saw this, I gave up arguing that our competitors were not being fairand simply changed our defaults to always leave memory pinned forOpenFabrics-based networks.

To be clear: the option was "--mca mpi_leave_pinned 1" -- granted, thename wasn't as obvious as "--performance", but this option was widelypublicized and easy to know that you should do for benchmarks (with aname like --performance, the natural question will be "why don't youenable [--]performance by default? This means that OMPI has --no-performance by default...?"). I would tell person/marketer X at aconference, "Hey, you didn't run with leave_pinned; our numbers aremuch better than that." "Oh, sorry" they would inevitably say; "I'llfix it next time I make new slides."


There are several problems that arise from this scenario:

1. The competitors aren't interested in being fair. Spin iseverything. HPC is highly competitive.

2. Even if you tag someone in public for not being fair, they alwayssay the same thing, "Oh sorry, my mistake" (regardless of whether theyactually forgot or did it intentionally). I told several competitors*many times* that they had to use leave_pinned, but in all publiccomparison numbers, they never did. Hence, they always looked better.

(/me takes a moment to calm down after venturing down memory lane ofall the unfair comparisons made against OMPI... :-) )

3. To some degree, "out of the box performance" *is* a compellingreason. Sure, I would hope that marketers and competitors to beethical (they aren't, but you can hope anyway), but the naive / newuser shouldn't need to know a million switches to get good performance.

Having good / simple switches to optimize for different workloads is agood thing (e.g., Platform MPI has some nice options for this kind ofstuff). But the bottom line is that you can't rely on someone runninganything other "mpirun -np x my_favorite_benchmark".


-----

Also, as an aside to many of the other posts, yes, this is a complexissue. But:

- We're only talking about defaults, not absolute behavior. If youwant or need to disable/change this behavior, you certainly can.

- It's been stated a few times, but I feel that this is important:most other MPI's bind by default. They're deriving performancebenefits from this. We're not. Open MPI has to be competitive (or mymanagement will ask me, "Why are you working on that crappy MPI?").

- The Linux scheduler does no/cannot optimize well for many HPC apps;binding definitely helps in many scenarios (not just benchmarks).

- Of course you can construct scenarios where things break / performbadly. Particularly if you do Wrong Things. If you do Wrong Things,you should be punished (e.g., via bad performance). It's not thesoftware's fault if you choose to bind 10 threads to 1 core. It's notthe software's fault if you're on a large SMP and you choose todedicate all of the processors to HPC apps and don't leave any for theOS (particularly if you have a lot of OS activity). And so on. Ofcourse, we should do a good job of trying to do reasonable things bydefault (e.g., not binding 10 threads to one core by default), and weshould provide options (sometimes automatic) for disabling thosereasonable things if we can't do them well. But sometimes we *do*have to rely on the user telling us things.

- I took Ralph's previous remarks as a general statement aboutthreading being problematic to any form of binding. I talked to himon the phone -- he actually had a specific case in mind (what I wouldconsider Wrong Behavior: binding N threads to 1 core).


-----

Ralph and I chatted earlier; I would be ok to wait for the other 2pieces of functionality to come in before we make binding occur bydefault:

1. coordinate between multiple OMPI jobs on the same node to ensurenot to bind to the same cores (or at least print a warning)

2. follow the binding directives of resource managers (SLURM, Torque,etc.)

Sun is free to make binding-by-default in the ClusterToolsdistribution if/whenever they want, of course. I fully understandtheir reasoning for doing so. They're also in a better position tocoach their users when to use which options, etc. because they havedirect contact with their users (vs. the community Open MPI, wherehundreds of people download Open MPI a day and we never hear fromthem). I *believe* that this option is also ok with Sun (I'm prettysure Terry told me this last week), but I don't want to speak for them.


My $0.02.

--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI devel] Heads up on new feature to 1.3.4

Reply via email to