Hi John -

How many cores does your system have? How much memory? How much memory is the 
program using? How many locales are you launching on the single system? How 
many threads are you assigning to each locale?

I’d bet that your problem is either from different threads contending over the 
same processor resources (which you can limit as Greg pointed out with 
CHPL_RT_NUM_THREADS_PER_LOCALE ) or using to much memory since you are running 
many locales on a single machine – and Chapel doesn’t currently try to reduce 
its per-locale resources when oversubscribed in this manner.

Of course, it could be the communication as well – you can check that. You can 
also instrument your program to print out communication counts (as I described 
in an earlier email – try mirroring the use of CommDiagnostics in 
chapel-lang-github/test/performance/ferguson/remote-class-read.chpl )

You can also try running on a real cluster…

Cheers,

-michael

From: Greg Titus <[email protected]<mailto:[email protected]>>
Date: Thursday, February 19, 2015 at 11:41 AM
To: John MacFrenz <[email protected]<mailto:[email protected]>>
Cc: Michael Ferguson <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Variable Block Distributions

Folks here pointed out a mistake I made in the discussion below: the 
environment variable that sets the number of threads and thus processor cores 
to use is CHPL_RT_NUM_THREADS_PER_LOCALE. I left off the '_RT' part of that 
variable name below. It won't work if it's not spelled right! :-)

greg


On 2/19/2015 8:45 AM, Greg Titus wrote:
Hi John --

(You already know the following as general knowledge, but I thought I'd include 
it for people newer to multi-locale programming who may be following this 
conversation.)

For context, with current Chapel it's normal for programs to suffer some 
performance loss when moving from single-locale to multi-locale execution. 
Using multiple locales offers more opportunity for parallelism, but at the cost 
of reduced intra-task performance due to network communication required for 
inter-locale variable references. The effect varies across programs depending 
on how much remote communication they do, but since a remote reference can 
easily take 1000 times as long as a local one, it doesn't take much to have a 
big effect. For example, with Chapel 1.10 on a Cray XC we don't see any 
drop-off in performance from 1 locale to 2 on the Stream benchmark, but our 
Stream doesn't do many inter-locale memory references. But for the RA 
benchmark, which does a lot of inter-locale references (in fact that's what 
it's measuring), our multi-locale performance doesn't match that on a single 
locale until we get up to 8-32 locales, depending on circumstances. And the 
Cray XC has a very high-performance network compared to UDP or MPI over 
ethernet.

That said, the >100x slowdown you're seeing seems a little high, unless your 
test case is really doing a lot of remote references. If it isn't, or at least 
shouldn't be, perhaps you're seeing a lot of remote communication for internal 
references to meta-data, within your distribution code? If this is the case, 
then turning on remote caching could well improve matters. In fact that might 
be a good test to rule this hypothesis in or out.

A secondary effect with GASNET_SPAWNFN=L could be oversubscription of the 
processor cores due to running more than one Chapel locale per compute node. To 
reduce the level of oversubscription you could set CHPL_NUM_THREADS_PER_LOCALE 
to the number of compute-node cores divided by the number of locales you're 
running on the compute node, but don't set it to less than 2 or you could 
deadlock/livelock due to internal starvation. However, if you're seeing the 
same slowdown with GASNET_SPAWNFN=S and one Chapel locale per compute node then 
I don't think this is something that is afflicting you right now.

hope this helps,
greg


On 2/19/2015 4:42 AM, John MacFrenz wrote:
Hi,

I'll give --cache-remote a try later. However for now I'm facing some problems 
which definitely should be solved first...

The problem I'm having is that using GASNET_SPAWNF=L with UDP-conduit with more 
than one locale causes program to run _very_ slowly. For example, with one 
locale my test program did take 0.20 sec to run. With two locales it took 65 
seconds. Same can be observed when running with GASNET_SPAWNF=S with UDP 
conduit on two separate machines. Using MPI conduit didn't make difference. 
Here's the environment variables I'm using

CHPL_HOME: /home/share/chapel/chapel-git
script location: /home/share/chapel/chapel-git/util
CHPL_HOST_PLATFORM: linux32
CHPL_HOST_COMPILER: gnu
CHPL_TARGET_PLATFORM: linux32
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: unknown
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet
  CHPL_COMM_SUBSTRATE: udp
  CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: fifo
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_MEM: cstdlib
CHPL_MAKE: gmake
CHPL_ATOMICS: intrinsics
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: none
CHPL_HWLOC: none
CHPL_REGEXP: none
CHPL_WIDE_POINTERS: struct
CHPL_LLVM: none
CHPL_AUX_FILESYS: none
Any idea what could be causing this? As I said on some previous post my target 
environment is heterogeneous (all x86, though) commodity cluster with ethernet 
connections, so either UDP or MPI conduit is one I'd use..


18.02.2015, 23:41, "Greg Titus" <[email protected]><mailto:[email protected]>:
Hi John --

A little bit of follow-up to what Michael says here ...

The "nemesis" he refers to is the internal name of the particular Qthreads 
scheduler we use when CHPL_LOCALE_MODEL=flat. Our understanding is that the 
nemesis scheduler currently doesn't move qthreads (and by extension, Chapel 
tasks) from pthread to pthread, which would break the use of pthread local 
storage inside the remote caching implementation. But there are significant 
caveats here:

  *   We use a different Qthreads scheduler when CHPL_LOCALE_MODEL=numa, and 
that one definitely does move qthreads (thus Chapel tasks) from pthread to 
pthread.
  *   We can't guarantee that we'll always use "nemesis" with the flat locale 
model.
  *   We can't guarantee that, even if we do keep using it, "nemesis" will 
continue to not move qthreads (thus Chapel tasks) from pthread to pthread.

Taken together, this basically says that although we haven't observed remote 
caching failures with qthreads, that shouldn't be taken as evidence that it 
definitely does work now or will work in the future.

greg


On 2/18/2015 2:31 PM, Michael Ferguson wrote:
Hi -

One more thing about the --cache-remote feature, just to be clear and for 
future reference:

The remote caching depends on pthread local storage, and Chapel task movement 
among worker pthreads in Qthreads-based tasking could break it. So far we 
haven't seen this happen, but we cannot guarantee it won't. Symptoms of a 
failure could include silent wrong answers or segfaults, either of which could 
be solid or intermittent/sporadic.

I *think* that this problem won't come up with the nemesis qthreads scheduler, 
but we need to do some careful analysis before we can declare the 
--cache-remote feature safe to use with qthreads.

Cheers,

-michael



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk



_______________________________________________
Chapel-users mailing list
[email protected]<mailto:[email protected]>https://lists.sourceforge.net/lists/listinfo/chapel-users


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to