Re: [hpx-users] Performance Counter Data Interpretation

Thomas Heller Tue, 09 Jan 2018 07:40:26 -0800

Hi Kilian,

The cumulative overhead counter indeed looks very odd. What it means is
the accumulated time *not* spend doing useful work.
Let's break it appart (For the single thread case):
 - The overall runtime of your application appears to be 2 seconds.
 - The number of total tasks run is 127704 with an average task length
   of about 15 micro seconds
 - The cumulative-overhead and the idle rate are closely related. about
   5.14% of the total runtime is spent in the scheduler. This is a good
   starting point


Now, when looking at the two core case, the overall number of tasks
doesn't significantly change, however, the average runtime increases
noticeably. What you observe in addition, is that the idle rate is
clocking in at 44%. This matches with the accumulated time spent in the
scheduler (the overhead) and the accumulated time spent doing useful
work. This trend is then followed by the 8 core run.

This could mean a variety of different things, the obvious one is this:
The average task duration is too little. That is, your granularity
appears to be too fine. IIRC, we recommend an average task length of
about 100 micro seconds (one magnitude different to your results).

What you see essentially is an increase of overhead due to the task
queues in the scheduler fighting for work.

To quickly test this hypothesis, you could artificially increase the
runtime of your tasks and see if that increases scalabilty. If that
helps, we have a winner and you need implement some kind of granularity
control, such that your tasks perform about 10 times more work, which
allows to mitigate the overheads more easily.

Hope that helps,
Thomas

On 01/09/2018 03:55 PM, Kilian Werner wrote:
> Dear Prof. Kaiser,
> 
> it is a new application we are building with hpx used for 
> parallelization (and a later distribution in mind).
> We have been working on it for the last two months and the 
> speedups have been this bad right from the start.
> However since the obvious, inherent bottlenecks were dealt 
> with one after the other we went to performance counters 
> for more detailed profiling.
> 
> Thanks,
> Kilian Werner
> 
> On Tue, 9 Jan 2018 08:47:12 -0600
>   "Hartmut Kaiser" <[email protected]> wrote:
>> Kilian,
>>
>> Was this slowdown happening always or did it just 
>> started to be bad
>> recently?
>>
>> Thanks!
>> Regards Hartmut
>> ---------------
>> http://boost-spirit.com
>> http://stellar.cct.lsu.edu
>>
>>
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:hpx-users-
>>> [email protected]] On Behalf Of Kilian Werner
>>> Sent: Tuesday, January 9, 2018 7:46 AM
>>> To: [email protected]
>>> Subject: [hpx-users] Performance Counter Data 
>>> Interpretation
>>>
>>> Dear hpx user list,
>>>
>>> one of our projects shows unexpectedly bad speedups when
>>> supplying additional OS-worker-threads to HPX.
>>> The project is run locally and in parallel on a machine
>>> with 8 cores, trying to pin down the parallelization
>>> bottleneck we printed the built in HPX Performance
>>> Counters as seen below.
>>> The parallelization is achieved by scheduling tasks with
>>> hpx::apply that themselves will schedule additional 
>>> tasks
>>> with hpx::apply.
>>> The program terminates after a final task (that can
>>> identify itself and will always finish last, independent
>>> of task scheduling order) fires an event.
>>> Synchronization is performed with some
>>> hpx::lcos::local::mutex locks.
>>>
>>> The problem seems to be apparent when looking at the
>>> harshly growing cumulative-overhead per worker-thread 
>>> when
>>> employing more OS threads.
>>> However we are a bit clueless as to interpret the 
>>> meaning
>>> of this cumulative-overhead counter.
>>> We were especially surprised to find, that the
>>> per-worker-thread overhead at some point came close to 
>>> and
>>> even surpassed the total cumulative runtime (see
>>> cumulative overhead of worker thread 0 when run with 8 
>>> os
>>> threads  vs. total cumulative runtime).
>>>
>>> What exactly does the performance counter
>>> /threads/time/cumulative-overhead measure? How can the
>>> overhead be larger than the total execution time?
>>> How could we narrow down the causes for the growing
>>> overhead? For example how could we measure how much time
>>> is spend waiting at (specific) mutexes  in total?
>>>
>>> Thanks in advance,
>>>
>>> Kilian Werner
>>>
>>>
>>>
>>> --hpx:threads 1:
>>>
>>> /threads{locality#0/total/total}/count/cumulative,1,2.015067,[s],127704
>>> /threads{locality#0/total/total}/time/average,1,2.015073,[s],14938,[ns]
>>> /threads{locality#0/total/total}/time/cumulative,1,2.015074,[s],1.90769e+0
>>> 9,[ns]
>>> /threads{locality#0/total/total}/time/cumulative-
>>> overhead,1,2.015076,[s],1.03483e+08,[ns]
>>> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative-
>>> overhead,1,2.015076,[s],1.03483e+08,[ns]
>>> /threads{locality#0/total/total}/idle-rate,1,2.015078,[s],514,[0.01%]
>>>
>>> --hpx:threads 2:
>>>
>>> /threads{locality#0/total/total}/count/cumulative,1,1.814639,[s],112250
>>> /threads{locality#0/total/total}/time/average,1,1.814644,[s],17986,[ns]
>>> /threads{locality#0/total/total}/time/cumulative,1,1.814654,[s],2.01907e+0
>>> 9,[ns]
>>> /threads{locality#0/total/total}/time/cumulative-
>>> overhead,1,1.814647,[s],1.60469e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative-
>>> overhead,1,1.814599,[s],1.12562e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#1}/time/cumulative-
>>> overhead,1,1.814649,[s],4.79071e+08,[ns]
>>> /threads{locality#0/total/total}/idle-rate,1,1.814603,[s],4428,[0.01%]
>>>
>>> --hpx:threads 8:
>>>
>>> /threads{locality#0/total/total}/count/cumulative,1,4.597361,[s],109476
>>> /threads{locality#0/total/total}/time/average,1,4.597373,[s],37988,[ns]
>>> /threads{locality#0/total/total}/time/cumulative,1,4.597335,[s],4.1588e+09
>>> ,[ns]
>>> /threads{locality#0/total/total}/time/cumulative-
>>> overhead,1,4.597325,[s],3.25232e+10,[ns]
>>> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative-
>>> overhead,1,4.597408,[s],4.20735e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#1}/time/cumulative-
>>> overhead,1,4.597390,[s],4.08787e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#2}/time/cumulative-
>>> overhead,1,4.597385,[s],3.62298e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#3}/time/cumulative-
>>> overhead,1,4.597358,[s],4.12475e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#4}/time/cumulative-
>>> overhead,1,4.597338,[s],4.10011e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#5}/time/cumulative-
>>> overhead,1,4.597402,[s],4.14242e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#6}/time/cumulative-
>>> overhead,1,4.597353,[s],4.13593e+09,[ns]
>>> /threads{locality#0/pool#default/worker-thread#7}/time/cumulative-
>>> overhead,1,4.597408,[s],4.13275e+09,[ns]
>>> /threads{locality#0/total/total}/idle-rate,1,4.597350,[s],8867,[0.01%]
>>>
>>>
>>> _______________________________________________
>>> hpx-users mailing list
>>> [email protected]
>>> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
>>
>> _______________________________________________
>> hpx-users mailing list
>> [email protected]
>> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
> 
> _______________________________________________
> hpx-users mailing list
> [email protected]
> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
> 

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] Performance Counter Data Interpretation

Reply via email to