On May 31, 2014, at 10:32 AM, "Lecrenski, Stephen K       PW" 
<stephen.lecren...@pw.utc.com> wrote:

> This case was a very simple 6 process test on a single node which ran to 
> completion.
> 
> I'm installing mpi 1.8.1 now to see if I see the same issue.
> 
> I just installed and ran hwloc.  What am I looking for?  I see basic 
> information PCI (ib0, ib1, mix4_0) PCI(eth0) PCI(eth1) PCI() PCI(sda)  and 
> others...

The fact that it ran without hanging for a huge period of time is a good sign; 
that's really all I was looking for.

> When I launch the mpi process I'm using mpirun --mca btl self,sm,openib

That should be fine.

> I have not explicitly specified in mpirun to use processor affinity.  When 
> running top (1) I do see that the processes migrate from core to core from 
> time to time.  

With 1.6.x, that sounds good.  That does make it weirder, though -- you weren't 
using affinity, but you were spending giant amounts of time in the affinity 
code.  Strange.

With 1.8.x, OMPI enables affinity by default.

Let's see what happens with 1.8.x -- if upgrading solves your problem, that 
would be best.

> Am I using processor affinity and if so shouldn't the process(s) remain on 
> each individual core throughout execution?  Hyperthreading is off.  I am not 
> using a rank file nor specifying the mpirun command to explicitly use 
> processor affinity.
> 
> skl
> 860-557-2895
> 
> CONFIDENTIALITY WARNING: This email may contain privileged or confidential 
> information and is for the sole use of the intended recipients.  Unauthorized 
> disclosure or use of this communication is prohibited.  If you believe that 
> you have received this email in error, please notify the sender immediately 
> and delete it from your system.
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Saturday, May 31, 2014 8:13 AM
> To: Open MPI Developers
> Subject: [External] Re: [OMPI devel] Open MPI 1.6.5 
> opal_paffinity_base_get_physical_socket_id
> 
> The super short answer is: 1.6.x is old and deprecated; can you upgrade to 
> the 1.8.x series?
> 
> The short answer is "no" -- paffinity calls should never block, but it 
> depends on how and what you're measuring.
> 
> The more detailed answer is: your trace below looks like it includes a call 
> to MPI_Abort.  Did your process hang during the abort, perchance, and 
> (somehow) get stuck in a process affinity call?
> 
> Are you able to download and run the lstopo command from the hwloc suite?  
> (http://www.open-mpi.org/software/hwloc/v1.9/)
> 
> 
> 
> 
> On May 30, 2014, at 2:47 PM, "Lecrenski, Stephen K PW" 
> <stephen.lecren...@pw.utc.com> wrote:
> 
>> I am running some performance tests (Open SpeedShop) with a program which 
>> uses Open MPI and Infiniband. 
>> 
>> I see a line of code which appears to be taking a considerable amount of cpu 
>> time in relation to other pieces of the code.
>> 
>> opal_paffinity_base_get_physical_socket_id (libmpi.so.1.0.8: 
>> paffinity_base_wrappers.c,118)
>> 
>>  Exclusive CPU time in seconds.
>> % of CPU Time
>> Statement Location (Line Number)
>> 19031.94
>> 38.339796
>> paffinity_base_wrappers.c(118)
>> 14188.66
>> 28.583021
>> paffinity_base_wrappers.c(113)
>> 10934.38
>> 22.027282
>> paffinity_base_wrappers.c(129)
>> 2185.16
>> 4.401999
>> paffinity_base_wrappers.c(121)
>> 1081.96
>> 2.179606
>> paffinity_base_wrappers.c(130)
>> 546.93
>> 1.101789
>> paffinity_base_wrappers.c(114)
>> 546.17
>> 1.100258
>> paffinity_base_wrappers.c(65)
>> 541.67
>> 1.091193
>> paffinity_base_wrappers.c(126)
>> 540.52
>> 1.088876
>> ompi_mpi_abort.c(80)
>> 2.23
>> 0.004492
>> ompi_mpi_abort.c(101)
>> 
>> 
>> Is this normal behavior?
>> 
>> Thanks,
>> 
>> Stephen Lecrenski
>> High Performance Technical Computing
>> 
>> Pratt & Whitney
>> 400 Main Street
>> East Hartford,CT 06108
>> Telephone: 860 - 557 - 2895
>> Email:         stephen.lecren...@pw.utc.com
>> P Please consider the environment before printing this e-mail 
>> CONFIDENTIALITY WARNING: This email may contain privileged or confidential 
>> information and is for the sole use of the intended recipients.  
>> Unauthorized disclosure or use of this communication is prohibited.  If you 
>> believe that you have received this email in error, please notify the sender 
>> immediately and delete it from your system.
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14915.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14916.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14917.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to