I'm going to wind up this thread with some good news. I've run the
crosstalk_replicated.chpl code from the previous response on my 16-node ARM
cluster and it performs > 10x faster than where we started.
NOTE: these times include setup and shutdown times, so you can subtract a
few seconds for real compute time.
uint(32)
$ time ./crosstalk_replicated -nl 16
real 5m1.031s
user 0m0.199s
sys 0m0.638s
string
$ time ./crosstalk_replicated -nl 16
real 6m33.824s
user 0m0.122s
sys 0m0.422s
On Wed, May 20, 2015 at 2:17 PM, Brian Guarraci <[email protected]> wrote:
> I've debugged the ReplicatedDist variant and it's pretty fast, even with
> strings. On a single macbook pro, I get the following table. See attached
> code.
>
> 1 locale 2 locales ratio
> uint(32) 3.3s 62.8s 19x
> string 4s 109s 27.25x
>
> machine info: Darwin tw-mbp-bguarraci 14.3.0 Darwin Kernel Version 14.3.0:
> Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64 x86_64
> CHPL_HOME: /Users/bguarraci/src/chapel-1.11.0
> script location: /Users/bguarraci/src/chapel-1.11.0/util
> CHPL_HOST_PLATFORM: darwin
> CHPL_HOST_COMPILER: clang
> CHPL_TARGET_PLATFORM: darwin
> CHPL_TARGET_COMPILER: clang
> CHPL_TARGET_ARCH: native
> CHPL_LOCALE_MODEL: flat
> CHPL_COMM: gasnet
> CHPL_COMM_SUBSTRATE: udp
> CHPL_GASNET_SEGMENT: everything
> CHPL_TASKS: fifo
> CHPL_LAUNCHER: amudprun
> CHPL_TIMERS: generic
> CHPL_MEM: cstdlib
> CHPL_MAKE: make
> CHPL_ATOMICS: intrinsics
> CHPL_NETWORK_ATOMICS: none
> CHPL_GMP: none
> CHPL_HWLOC: none
> CHPL_REGEXP: none
> CHPL_WIDE_POINTERS: struct
> CHPL_LLVM: none
> CHPL_AUX_FILESYS: none
>
>
>
> On Wed, May 20, 2015 at 12:44 PM, Brian Guarraci <[email protected]> wrote:
>
>> Hi Michael,
>>
>> Cool! I was indeed using fifo because qthreads doesn't compile on ARM
>> (yet?). I will give the int variant a try to see how they perform.
>>
>> Also, I've been working on a version of the same code that uses your
>> feedback. Here's the not-fully-tested variant. I'm also working on a
>> variant that doesn't use Partitions at all, which should produce the
>> simplest and fastest variant of all.
>>
>> On Wed, May 20, 2015 at 12:26 PM, Michael Ferguson <[email protected]>
>> wrote:
>>
>>> Hi Brian -
>>>
>>> I tried running two different simplified versions of your program
>>> on my laptop, using GASNET with the UDP conduit.
>>>
>>> One version uses strings (as you had). The other version replaces
>>> the strings with uints but is conceptually the same.
>>>
>>> This table summarizes my results for 100 iterations:
>>>
>>> on my laptop (GASNet UDP, qthreads):
>>> 1 locale 2 locales
>>> string 0.000147 s 8.502195 s -> 50,000x slower
>>> uint 0.000024 s 0.000029 s -> 1.2x slower
>>>
>>> on a linux server (GASNet UDP, qhtreads):
>>> 1 locale 2 locales
>>> string 0.000311 s 0.013721 s -> 44x slower
>>> uint 0.000016 s 0.000035 s -> 2x slower
>>>
>>>
>>> But, if I change to CHPL_TASKS=fifo, I get this table:
>>>
>>> on my laptop (GASNet UDP, fifo):
>>> 1 locale 2 locales
>>> string 0.000129 s 0.003669 s -> 28x slower
>>> uint 0.000024 s 0.000032 s -> 1.3x slower
>>>
>>>
>>> on a linux server (GASNet UDP, fifo):
>>> 1 locale 2 locales
>>> string 0.000144 s 0.001600 s -> 11x slower
>>> uint 0.000015 s 0.000026 s -> 1.7x slower
>>>
>>>
>>>
>>> I'm seeing similar, but not as severe problems with
>>> a real distributed system. Since the problem is not
>>> nearly as severe with CHPL_TASKS=fifo, I think that
>>> there is a problem qthreads and GASNet.
>>>
>>> That's in addition to string problems we already
>>> know about.
>>>
>>> Brian, were you using CHPL_TASKS=fifo on the
>>> ARM cluster when seeing the performance problem?
>>>
>>> I've attached the programs I was using in these experiments.
>>>
>>> Thanks,
>>>
>>> -michael
>>>
>>> >
>>>
>>>
>>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers