I've debugged the ReplicatedDist variant and it's pretty fast, even with strings. On a single macbook pro, I get the following table. See attached code.
1 locale 2 locales ratio
uint(32) 3.3s 62.8s 19x
string 4s 109s 27.25x
machine info: Darwin tw-mbp-bguarraci 14.3.0 Darwin Kernel Version 14.3.0:
Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64 x86_64
CHPL_HOME: /Users/bguarraci/src/chapel-1.11.0
script location: /Users/bguarraci/src/chapel-1.11.0/util
CHPL_HOST_PLATFORM: darwin
CHPL_HOST_COMPILER: clang
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet
CHPL_COMM_SUBSTRATE: udp
CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: fifo
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_MEM: cstdlib
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_NETWORK_ATOMICS: none
CHPL_GMP: none
CHPL_HWLOC: none
CHPL_REGEXP: none
CHPL_WIDE_POINTERS: struct
CHPL_LLVM: none
CHPL_AUX_FILESYS: none
On Wed, May 20, 2015 at 12:44 PM, Brian Guarraci <[email protected]> wrote:
> Hi Michael,
>
> Cool! I was indeed using fifo because qthreads doesn't compile on ARM
> (yet?). I will give the int variant a try to see how they perform.
>
> Also, I've been working on a version of the same code that uses your
> feedback. Here's the not-fully-tested variant. I'm also working on a
> variant that doesn't use Partitions at all, which should produce the
> simplest and fastest variant of all.
>
> On Wed, May 20, 2015 at 12:26 PM, Michael Ferguson <[email protected]>
> wrote:
>
>> Hi Brian -
>>
>> I tried running two different simplified versions of your program
>> on my laptop, using GASNET with the UDP conduit.
>>
>> One version uses strings (as you had). The other version replaces
>> the strings with uints but is conceptually the same.
>>
>> This table summarizes my results for 100 iterations:
>>
>> on my laptop (GASNet UDP, qthreads):
>> 1 locale 2 locales
>> string 0.000147 s 8.502195 s -> 50,000x slower
>> uint 0.000024 s 0.000029 s -> 1.2x slower
>>
>> on a linux server (GASNet UDP, qhtreads):
>> 1 locale 2 locales
>> string 0.000311 s 0.013721 s -> 44x slower
>> uint 0.000016 s 0.000035 s -> 2x slower
>>
>>
>> But, if I change to CHPL_TASKS=fifo, I get this table:
>>
>> on my laptop (GASNet UDP, fifo):
>> 1 locale 2 locales
>> string 0.000129 s 0.003669 s -> 28x slower
>> uint 0.000024 s 0.000032 s -> 1.3x slower
>>
>>
>> on a linux server (GASNet UDP, fifo):
>> 1 locale 2 locales
>> string 0.000144 s 0.001600 s -> 11x slower
>> uint 0.000015 s 0.000026 s -> 1.7x slower
>>
>>
>>
>> I'm seeing similar, but not as severe problems with
>> a real distributed system. Since the problem is not
>> nearly as severe with CHPL_TASKS=fifo, I think that
>> there is a problem qthreads and GASNet.
>>
>> That's in addition to string problems we already
>> know about.
>>
>> Brian, were you using CHPL_TASKS=fifo on the
>> ARM cluster when seeing the performance problem?
>>
>> I've attached the programs I was using in these experiments.
>>
>> Thanks,
>>
>> -michael
>>
>> >
>>
>>
>
crosstalk_replicated.chpl
Description: Binary data
------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________ Chapel-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-developers
