On May 31, 2014, at 10:32 AM, "Lecrenski, Stephen K PW" <stephen.lecren...@pw.utc.com> wrote:
> This case was a very simple 6 process test on a single node which ran to > completion. > > I'm installing mpi 1.8.1 now to see if I see the same issue. > > I just installed and ran hwloc. What am I looking for? I see basic > information PCI (ib0, ib1, mix4_0) PCI(eth0) PCI(eth1) PCI() PCI(sda) and > others... The fact that it ran without hanging for a huge period of time is a good sign; that's really all I was looking for. > When I launch the mpi process I'm using mpirun --mca btl self,sm,openib That should be fine. > I have not explicitly specified in mpirun to use processor affinity. When > running top (1) I do see that the processes migrate from core to core from > time to time. With 1.6.x, that sounds good. That does make it weirder, though -- you weren't using affinity, but you were spending giant amounts of time in the affinity code. Strange. With 1.8.x, OMPI enables affinity by default. Let's see what happens with 1.8.x -- if upgrading solves your problem, that would be best. > Am I using processor affinity and if so shouldn't the process(s) remain on > each individual core throughout execution? Hyperthreading is off. I am not > using a rank file nor specifying the mpirun command to explicitly use > processor affinity. > > skl > 860-557-2895 > > CONFIDENTIALITY WARNING: This email may contain privileged or confidential > information and is for the sole use of the intended recipients. Unauthorized > disclosure or use of this communication is prohibited. If you believe that > you have received this email in error, please notify the sender immediately > and delete it from your system. > > -----Original Message----- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres > (jsquyres) > Sent: Saturday, May 31, 2014 8:13 AM > To: Open MPI Developers > Subject: [External] Re: [OMPI devel] Open MPI 1.6.5 > opal_paffinity_base_get_physical_socket_id > > The super short answer is: 1.6.x is old and deprecated; can you upgrade to > the 1.8.x series? > > The short answer is "no" -- paffinity calls should never block, but it > depends on how and what you're measuring. > > The more detailed answer is: your trace below looks like it includes a call > to MPI_Abort. Did your process hang during the abort, perchance, and > (somehow) get stuck in a process affinity call? > > Are you able to download and run the lstopo command from the hwloc suite? > (http://www.open-mpi.org/software/hwloc/v1.9/) > > > > > On May 30, 2014, at 2:47 PM, "Lecrenski, Stephen K PW" > <stephen.lecren...@pw.utc.com> wrote: > >> I am running some performance tests (Open SpeedShop) with a program which >> uses Open MPI and Infiniband. >> >> I see a line of code which appears to be taking a considerable amount of cpu >> time in relation to other pieces of the code. >> >> opal_paffinity_base_get_physical_socket_id (libmpi.so.1.0.8: >> paffinity_base_wrappers.c,118) >> >> Exclusive CPU time in seconds. >> % of CPU Time >> Statement Location (Line Number) >> 19031.94 >> 38.339796 >> paffinity_base_wrappers.c(118) >> 14188.66 >> 28.583021 >> paffinity_base_wrappers.c(113) >> 10934.38 >> 22.027282 >> paffinity_base_wrappers.c(129) >> 2185.16 >> 4.401999 >> paffinity_base_wrappers.c(121) >> 1081.96 >> 2.179606 >> paffinity_base_wrappers.c(130) >> 546.93 >> 1.101789 >> paffinity_base_wrappers.c(114) >> 546.17 >> 1.100258 >> paffinity_base_wrappers.c(65) >> 541.67 >> 1.091193 >> paffinity_base_wrappers.c(126) >> 540.52 >> 1.088876 >> ompi_mpi_abort.c(80) >> 2.23 >> 0.004492 >> ompi_mpi_abort.c(101) >> >> >> Is this normal behavior? >> >> Thanks, >> >> Stephen Lecrenski >> High Performance Technical Computing >> >> Pratt & Whitney >> 400 Main Street >> East Hartford,CT 06108 >> Telephone: 860 - 557 - 2895 >> Email: stephen.lecren...@pw.utc.com >> P Please consider the environment before printing this e-mail >> CONFIDENTIALITY WARNING: This email may contain privileged or confidential >> information and is for the sole use of the intended recipients. >> Unauthorized disclosure or use of this communication is prohibited. If you >> believe that you have received this email in error, please notify the sender >> immediately and delete it from your system. >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14915.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14916.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14917.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/