Gilles, a short update about the patched version (3.0.0). After we updated from CentOS 7.3 to 7.4, this version build with all versions of all compilers stopped to work with message like: $ a.out: symbol lookup error: /opt/MPI/openmpi-3.0.0p/linux/intel_17.0.5.239/lib/openmpi/mca_mpool_memkind.so: undefined symbol: memkind_get_kind_by_partition
Short look in-depth show up that memkind package containing libmemkind.so.0.0.1 was updated from memkind-1.4.0-1.el7.x86_64 to memkind-1.5.0-1.el7.x86_64 by updating CentOS, and that the older one really contain 'memkind_get_kind_by_partition' symbol whilst the new one did not have this symbol anymore. So I will rebuild this version to look what happen' (regular v3.0.0 didn't have the issue). Nevertheless I liked to give you a report about this side effect of your patch ... [reading the fine google search results first] Likely I see another one of type https://github.com/open-mpi/ompi/issues/4466 Most amazing is that only one version of Open MPI (the patched 3.0.0 one) stops to work instead of all. Seem's we're lucky. WOW. will report on results of 3.0.0p rebuild. best, Paul Kapinos $ objdump -S /usr/lib64/libmemkind.so.0.0.1 | grep -i memkind_get_kind_by_partition 0000000000007f70 <memkind_get_kind_by_partition>: 7f76: 77 19 ja 7f91 <memkind_get_kind_by_partition+0x21> 7f89: 74 06 je 7f91 <memkind_get_kind_by_partition+0x21> On 10/12/2017 11:21 AM, Gilles Gouaillardet wrote: > Paul, > > Sorry for the typo. > > The patch was developed on the master branch. > Note v1.10 is no more supported, and since passive wait is a new feature, it > would start at v3.1 or later. > > That being said, if you are kind of stucked with 1.10.7, i can try to craft a > one off patch in order to help > > > Cheers, > > Gilles > > Paul Kapinos <kapi...@itc.rwth-aachen.de> wrote: >> Hi Gilles, >> Thank you for your message and quick path! >> >> You likely mean (instead of links in your eMail below) >> https://github.com/open-mpi/ompi/pull/4331 and >> https://github.com/open-mpi/ompi/pull/4331.patch >> for your PR #4331 (note '4331' instead of '4431' :-) >> >> I was not able to path 1.10.7 release - likely because you develop on much >> much >> newer version of Open MPI. >> >> Q1: on *which* release the path 4331 should be applied? >> >> Q2: I assume it is unlikely that this patch would be back-ported to 1.10.x? >> >> Best >> Paul Kapinos >> >> >> >> >> On 10/12/2017 09:31 AM, Gilles Gouaillardet wrote: >>> Paul, >>> >>> >>> i made PR #4331 https://github.com/open-mpi/ompi/pull/4431 in order to >>> implement >>> this. >>> >>> in order to enable passive wait, you simply need to >>> >>> mpirun --mca mpi_poll_when_idle true ... >>> >>> >>> fwiw, when you use mpi_yield_when_idle, Open MPI does (highly >>> oversimplified) >>> >>> for (...) sched_yield(); >>> >>> >>> as you already noted, top show 100% cpu usage (a closer look shows the >>> usage is >>> in the kernel and not user space). >>> >>> that being said, since the process is only yielding, the other running >>> processes >>> will get most of their time slices, >>> >>> and hence the system remains pretty responsive. >>> >>> >>> Can you please give this PR a try ? >>> >>> the patch can be manually downloaded at >>> https://github.com/open-mpi/ompi/pull/4431.patch >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 10/12/2017 12:37 AM, Paul Kapinos wrote: >>>> Dear Jeff, >>>> Dear All, >>>> >>>> we know about *mpi_yield_when_idle* parameter [1]. We read [2]. You're >>>> right, >>>>> if an MPI application is waiting a long time for messages, >>>>> perhaps its message passing algorithm should be re-designed >>>> ... but we cannot spur the ParaView/VTK developer to rewrite their software >>>> famous for busy-wait on any user mouse move with N x 100% CPU load [3]. >>>> >>>> It turned out that >>>> a) (at least some) spin time is on MPI_Barrier call (waitin' user >>>> interaction) >>>> b) for Intel MPI and MPICH we found a way to disable this busy wait [4] >>>> >>>> c) But, for both 'pvserver' and minimal example (attached), we were not >>>> able to >>>> stop the busy waiting with Open MPI: setting *mpi_yield_when_idle* >>>> parameter to >>>> '1' just seem to move the spin activity from userland to kernel, with >>>> staying at >>>> 100%, cf. attached screenshots and [5]. The behaviour is the same for >>>> 1.10.4 and >>>> 2.0.2. >>>> >>>> Well, The Question: is there a way/a chance to effectively disable the >>>> busy wait >>>> using Open MPI? >>>> >>>> Best, >>>> >>>> Paul Kapinos >>>> >>>> [1] http://www.open-mpi.de/faq/?category=running#force-aggressive-degraded >>>> [2] >>>> http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress >>>> [3] >>>> https://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Server_processes_always_have_100.25_CPU_usage >>>> >>>> [4] >>>> https://public.kitware.com/pipermail/paraview-developers/2017-October/005587.html >>>> [5] >>>> https://serverfault.com/questions/180711/what-exactly-do-the-colors-in-htop-status-bars-mean >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> devel@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/devel >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/devel >>> >> >> >> -- >> Dipl.-Inform. Paul Kapinos - High Performance Computing, >> RWTH Aachen University, IT Center >> Seffenter Weg 23, D 52074 Aachen (Germany) >> Tel: +49 241/80-24915 >> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel