I'm not sure I follow -- are you saying that Open MPI is disabling the large mmap allocations, and we shouldn't?
On Jan 8, 2010, at 9:25 AM, Sylvain Jeaugey wrote: > On Thu, 7 Jan 2010, Eugene Loh wrote: > > > Could someone tell me how these settings are used in OMPI or give any > > guidance on how they should or should not be used? > This is a very good question :-) As this whole e-mail, though it's hard > (in my opinion) to give it a Good (TM) answer. > > > This means that if you loop over the elements of multiple large arrays > > (which is common in HPC), you can generate a lot of cache conflicts, > > depending on the cache associativity. > On the other hand, high buffer alignment sometimes gives better > performance (e.g. Infiniband QDR bandwidth). > > > There are multiple reasons one might want to modify the behavior of the > > memory allocator, including high cost of mmap calls, wanting to register > > memory for faster communications, and now this cache-conflict issue. The > > usual solution is > > > > setenv MALLOC_MMAP_MAX_ 0 > > setenv MALLOC_TRIM_THRESHOLD_ -1 > > > > or the equivalent mallopt() calls. > But yes, this set of settings is the number one tweak on HPC code that I'm > aware of. > > > This issue becomes an MPI issue for at least three reasons: > > > > *) MPI may care about these settings due to memory registration and > > pinning. > > (I invite you to explain to me what I mean. I'm talking over my head here.) > Avoiding mmap is good since it prevents from calling munmap (a function we > need to hack to prevent data corruption). > > > *) (Related to the previous bullet), MPI performance comparisons may > > reflect > > these effects. Specifically, in comparing performance of OMPI, Intel MPI, > > Scali/Platform MPI, and MVAPICH2, some tests (such as HPCC and SPECmpi) have > > shown large performance differences between the various MPIs when, it seems, > > none were actually spending much time in MPI. Rather, some MPI > > implementations were turning off large-malloc mmaps and getting good > > performance (and sadly OMPI looked bad in comparison). > I don't think this bullet is related to the previous one. The first one is > a good reason, this one is typically the Bad reason. Bad, but > unfortunately true : competitors' MPI libraries are faster because ... > they do much more than MPI (accelerate malloc being the main difference). > Which I think is Bad, because all these settings should be let in > developper's hands. You'll always find an application where these settings > will waste memory and prevent an application from running. > > > *) These settings seem to be desirable for HPC codes since they don't do > > much allocation/deallocation and they do tend to have loop nests that wade > > through multiple large arrays at once. For best "out of the box" > > performance, a software stack should turn these settings on for HPC. Codes > > don't typically identify themselves as "HPC", but some indicators include > > Fortran, OpenMP, and MPI. > In practice, I agree. Most HPC codes benefit from it. But I also ran into > codes where the memory waste was a problem. > > > I don't know the full scope of the problem, but I've run into this with at > > least HPCC STREAM (which shouldn't depend on MPI at all, but OMPI looks much > > slower than Scali/Platform on some tests) and SPECmpi (primarily one or two > > codes, though it depends also on problem size). > I had also those codes in mind. That's also why I don't like those MPI > "benchmarks", since they benchmark much more than MPI. They hence > encourage MPI provider to incorporate into their libraries things that > have (more or less) nothing to do with MPI. > > But again, yes, from the (basic) user point of view, library X seems > faster than library Y. When there is nothing left to improve on MPI, start > optimizing the rest .. maybe we should reimplement a faster libc inside > MPI :-) > > Sylvain > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Jeff Squyres jsquy...@cisco.com