I've debugged the ReplicatedDist variant and it's pretty fast, even with
strings.  On a single macbook pro, I get the following table.  See attached
code.

               1 locale  2 locales  ratio
uint(32)   3.3s       62.8s        19x
string      4s          109s          27.25x

machine info: Darwin tw-mbp-bguarraci 14.3.0 Darwin Kernel Version 14.3.0:
Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64 x86_64
CHPL_HOME: /Users/bguarraci/src/chapel-1.11.0
script location: /Users/bguarraci/src/chapel-1.11.0/util
CHPL_HOST_PLATFORM: darwin
CHPL_HOST_COMPILER: clang
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet
  CHPL_COMM_SUBSTRATE: udp
  CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: fifo
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_MEM: cstdlib
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: none
CHPL_HWLOC: none
CHPL_REGEXP: none
CHPL_WIDE_POINTERS: struct
CHPL_LLVM: none
CHPL_AUX_FILESYS: none



On Wed, May 20, 2015 at 12:44 PM, Brian Guarraci <[email protected]> wrote:

> Hi Michael,
>
> Cool!  I was indeed using fifo because qthreads doesn't compile on ARM
> (yet?).  I will give the int variant a try to see how they perform.
>
> Also, I've been working on a version of the same code that uses your
> feedback.  Here's the not-fully-tested variant.  I'm also working on a
> variant that doesn't use Partitions at all, which should produce the
> simplest and fastest variant of all.
>
> On Wed, May 20, 2015 at 12:26 PM, Michael Ferguson <[email protected]>
> wrote:
>
>> Hi Brian -
>>
>> I tried running two different simplified versions of your program
>> on my laptop, using GASNET with the UDP conduit.
>>
>> One version uses strings (as you had). The other version replaces
>> the strings with uints but is conceptually the same.
>>
>> This table summarizes my results for 100 iterations:
>>
>> on my laptop (GASNet UDP, qthreads):
>>             1 locale     2 locales
>> string      0.000147 s   8.502195 s  -> 50,000x slower
>>   uint      0.000024 s   0.000029 s  -> 1.2x slower
>>
>> on a linux server (GASNet UDP, qhtreads):
>>             1 locale     2 locales
>> string      0.000311 s   0.013721 s  -> 44x slower
>>   uint      0.000016 s   0.000035 s  -> 2x slower
>>
>>
>> But, if I change to CHPL_TASKS=fifo, I get this table:
>>
>> on my laptop (GASNet UDP, fifo):
>>             1 locale     2 locales
>> string      0.000129 s   0.003669 s  -> 28x slower
>>   uint      0.000024 s   0.000032 s  -> 1.3x slower
>>
>>
>> on a linux server (GASNet UDP, fifo):
>>             1 locale     2 locales
>> string      0.000144 s   0.001600 s  -> 11x slower
>>   uint      0.000015 s   0.000026 s  -> 1.7x slower
>>
>>
>>
>> I'm seeing similar, but not as severe problems with
>> a real distributed system. Since the problem is not
>> nearly as severe with CHPL_TASKS=fifo, I think that
>> there is a problem qthreads and GASNet.
>>
>> That's in addition to string problems we already
>> know about.
>>
>> Brian, were you using CHPL_TASKS=fifo on the
>> ARM cluster when seeing the performance problem?
>>
>> I've attached the programs I was using in these experiments.
>>
>> Thanks,
>>
>> -michael
>>
>> >
>>
>>
>

Attachment: crosstalk_replicated.chpl
Description: Binary data

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to