Hi John --
(You already know the following as general knowledge, but I thought I'd
include it for people newer to multi-locale programming who may be
following this conversation.)
For context, with current Chapel it's normal for programs to suffer some
performance loss when moving from single-locale to multi-locale
execution. Using multiple locales offers more opportunity for
parallelism, but at the cost of reduced intra-task performance due to
network communication required for inter-locale variable references. The
effect varies across programs depending on how much remote communication
they do, but since a remote reference can easily take 1000 times as long
as a local one, it doesn't take much to have a big effect. For example,
with Chapel 1.10 on a Cray XC we don't see any drop-off in performance
from 1 locale to 2 on the Stream benchmark, but our Stream doesn't do
many inter-locale memory references. But for the RA benchmark, which
does a lot of inter-locale references (in fact that's what it's
measuring), our multi-locale performance doesn't match that on a single
locale until we get up to 8-32 locales, depending on circumstances. And
the Cray XC has a very high-performance network compared to UDP or MPI
over ethernet.
That said, the >100x slowdown you're seeing seems a little high, unless
your test case is really doing a lot of remote references. If it isn't,
or at least shouldn't be, perhaps you're seeing a lot of remote
communication for internal references to meta-data, within your
distribution code? If this is the case, then turning on remote caching
could well improve matters. In fact that might be a good test to rule
this hypothesis in or out.
A secondary effect with GASNET_SPAWNFN=L could be oversubscription of
the processor cores due to running more than one Chapel locale per
compute node. To reduce the level of oversubscription you could set
CHPL_NUM_THREADS_PER_LOCALE to the number of compute-node cores divided
by the number of locales you're running on the compute node, but don't
set it to less than 2 or you could deadlock/livelock due to internal
starvation. However, if you're seeing the same slowdown with
GASNET_SPAWNFN=S and one Chapel locale per compute node then I don't
think this is something that is afflicting you right now.
hope this helps,
greg
On 2/19/2015 4:42 AM, John MacFrenz wrote:
Hi,
I'll give --cache-remote a try later. However for now I'm facing some
problems which definitely should be solved first...
The problem I'm having is that using GASNET_SPAWNF=L with UDP-conduit
with more than one locale causes program to run _very_ slowly. For
example, with one locale my test program did take 0.20 sec to run.
With two locales it took 65 seconds. Same can be observed when running
with GASNET_SPAWNF=S with UDP conduit on two separate machines. Using
MPI conduit didn't make difference. Here's the environment variables
I'm using
CHPL_HOME: /home/share/chapel/chapel-git
script location: /home/share/chapel/chapel-git/util
CHPL_HOST_PLATFORM: linux32
CHPL_HOST_COMPILER: gnu
CHPL_TARGET_PLATFORM: linux32
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: unknown
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet
CHPL_COMM_SUBSTRATE: udp
CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: fifo
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_MEM: cstdlib
CHPL_MAKE: gmake
CHPL_ATOMICS: intrinsics
CHPL_NETWORK_ATOMICS: none
CHPL_GMP: none
CHPL_HWLOC: none
CHPL_REGEXP: none
CHPL_WIDE_POINTERS: struct
CHPL_LLVM: none
CHPL_AUX_FILESYS: none
Any idea what could be causing this? As I said on some previous post
my target environment is heterogeneous (all x86, though) commodity
cluster with ethernet connections, so either UDP or MPI conduit is one
I'd use..
18.02.2015, 23:41, "Greg Titus" <[email protected]>:
Hi John --
A little bit of follow-up to what Michael says here ...
The "nemesis" he refers to is the internal name of the particular
Qthreads scheduler we use when CHPL_LOCALE_MODEL=flat. Our
understanding is that the nemesis scheduler currently doesn't move
qthreads (and by extension, Chapel tasks) from pthread to pthread,
which would break the use of pthread local storage inside the remote
caching implementation. But there are significant caveats here:
* We use a different Qthreads scheduler when
CHPL_LOCALE_MODEL=numa, and that one definitely does move
qthreads (thus Chapel tasks) from pthread to pthread.
* We can't guarantee that we'll always use "nemesis" with the flat
locale model.
* We can't guarantee that, even if we do keep using it, "nemesis"
will continue to not move qthreads (thus Chapel tasks) from
pthread to pthread.
Taken together, this basically says that although we haven't observed
remote caching failures with qthreads, that shouldn't be taken as
evidence that it definitely does work now or will work in the future.
greg
On 2/18/2015 2:31 PM, Michael Ferguson wrote:
Hi -
One more thing about the --cache-remote feature, just to be clear
and for future reference:
The remote caching depends on pthread local storage, and Chapel task
movement among worker pthreads in Qthreads-based tasking could break
it. So far we haven't seen this happen, but we cannot guarantee it
won't. Symptoms of a failure could include silent wrong answers or
segfaults, either of which could be solid or intermittent/sporadic.
I *think* that this problem won't come up with the nemesis qthreads
scheduler, but we need to do some careful analysis before we can
declare the --cache-remote feature safe to use with qthreads.
Cheers,
-michael
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected] <mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/chapel-users
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users