Shannon V. Davidson
Sat, 26 Jul 2008 13:02:45 -0700
Schoenefeld, Keith wrote:
This definitely looked promising, but unfortunately it didn't work. I both added the appropriate export lines to my qsub file, and then when that didn't work I checked the mvapich.conf file and confirmed that the processor affinity was disabled. I wonder if I can turn it on and make it work, but unfortunately the cluster is full at the moment, so I can't test it.
You may want to verify that the environment variable was actually passed down to the MPI task. To set environment variables for MPI jobs, I usually either specify the environment variable on the mpirun command line or in a wrapper script:
mpirun -np 32 -hostfile nodes VIADEV_ENABLE_AFFINITY=0 a.out mpirun -np 32 -hostfile nodes run.sh a.outwhere run.sh sets up the local environment including environment variables.
The second method is more portable to various shells and MPI versions. Shannon
-- KS -----Original Message-----From: Shannon V. Davidson [EMAIL PROTECTED] Sent: Wednesday, July 23, 2008 4:02 PMTo: Schoenefeld, Keith Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Strange SGE scheduling problem Schoenefeld, Keith wrote:My cluster has 8 slots (cores)/node in the form of two quad-core processors. Only recently we've started running jobs on it thatrequire12 slots. We've noticed significant speed problems running multiple12slot jobs, and quickly discovered that the node that was running 4slotson one job and 4 slots on another job was running both jobs on thesameprocessor cores (i.e. both job1 and job2 were running on CPU's #0-#3, and the CPUs #4-#7 were left idling. The result is that the jobs were competing for time on half the processors that were available. In addition, a 4 slot job started well after the 12 slot job hasrampedup results in the same problem (both the 12 slot job and the four slot job get assigned to the same slots on a given node). Any insight as to what is occurring here and how I could prevent itfromhappening? We were are using SGE + mvapich 1.0 and a PE that has the $fill_up allocation rule. I have also posted this question to the [EMAIL PROTECTED] mailing list, so my apologies for people who get this email multiple times. Any insight as to what is occurring here and how I could prevent itfromhappening? We were are using SGE + mvapich 1.0 and a PE that has the $fill_up allocation rule.This sounds like MVAPICH is assigning your MPI tasks to your CPUs starting with CPU#0. If you are going to run multiple MVAPICH jobs on the same host, turn off CPU affinity by starting the MPI tasks with the environment variable VIADEV_USE_AFFINITY=0 and VIADEV_ENABLE_AFFINITY=0.Cheers, ShannonAny help is appreciated. -- KS _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf