Re: [hpx-users] Performance Counter Data Interpretation

Hartmut Kaiser Tue, 09 Jan 2018 07:36:04 -0800

Kilian,

> it is a new application we are building with hpx used for
> parallelization (and a later distribution in mind).
> We have been working on it for the last two months and the
> speedups have been this bad right from the start.
> However since the obvious, inherent bottlenecks were dealt
> with one after the other we went to performance counters
> for more detailed profiling.


Ok, I just wanted to make whether we recently broke something.

In general, high overheads point towards either too little parallelism or
too fine grain tasks. From the performance counter data collected when
running on one core you can discern that it's at least the latter:

> threads{locality#0/total/total}/time/average,1,2.015073,[s],14938,[ns]

This tells you that the average execution time of the HPX threads is in the
range of 15 microseconds. Taking into account that the overheads introduced
by the creation, scheduling, running, and destruction of a single HPX is in
the range of one microsecond, this average time hints at that your
parallelism is too fine-grained. I'd suggest trying to somehow combine
several tasks into one to be run sequentially by one HPX thread. Average
execution times of 150-200 microseconds are a good number for HPX.

You can also see that your idle-rate is increasing significantly once you
increase the number of cores to run on. This might indicate that you have
not enough parallelism in your application to warrant parallelization to the
extent you try it to run on. Once you run on 2 cores your idle-rate goes up
to 45%, which is quite a large number (idle-rates up to 15% are relatively
ok, usually).

>From what I can see your application launches a lot of very small tasks
which depend sequentially on each other. Alternatively, you have a lot of
cross-thread synchronization going on, which prevents efficient
parallelization. But that is pure conjecture and I might be completely
wrong.

The fact that you're mostly using 'apply' to schedule your threads hints at
the fact that you need to do your own synchronization which could be causing
the problems. Changing this to use futures as the means of synchronization
(i.e. use async instead of apply) and build asynchronous execution trees out
of those futures may reduce contention and may remove the need to explicitly
synchronize between threads. 

In general, I'd suggest using a tool like APEX (non-Windows only, preferred
for Linux) or Intel Amplifier (preferred for Windows, but works on Linux as
well) to visualize your real task dependencies at runtime. Both tools
require special configuration but both of them are directly supported by
HPX. Please let us know if you need more information

HTH
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu


> 
> Thanks,
> Kilian Werner
> 
> On Tue, 9 Jan 2018 08:47:12 -0600
>   "Hartmut Kaiser" <[email protected]> wrote:
> > Kilian,
> >
> > Was this slowdown happening always or did it just
> >started to be bad
> > recently?
> >
> > Thanks!
> > Regards Hartmut
> > ---------------
> > http://boost-spirit.com
> > http://stellar.cct.lsu.edu
> >
> >
> >> -----Original Message-----
> >> From: [email protected]
> >>[mailto:hpx-users-
> >> [email protected]] On Behalf Of Kilian Werner
> >> Sent: Tuesday, January 9, 2018 7:46 AM
> >> To: [email protected]
> >> Subject: [hpx-users] Performance Counter Data
> >>Interpretation
> >>
> >> Dear hpx user list,
> >>
> >> one of our projects shows unexpectedly bad speedups when
> >> supplying additional OS-worker-threads to HPX.
> >> The project is run locally and in parallel on a machine
> >> with 8 cores, trying to pin down the parallelization
> >> bottleneck we printed the built in HPX Performance
> >> Counters as seen below.
> >> The parallelization is achieved by scheduling tasks with
> >> hpx::apply that themselves will schedule additional
> >>tasks
> >> with hpx::apply.
> >> The program terminates after a final task (that can
> >> identify itself and will always finish last, independent
> >> of task scheduling order) fires an event.
> >> Synchronization is performed with some
> >> hpx::lcos::local::mutex locks.
> >>
> >> The problem seems to be apparent when looking at the
> >> harshly growing cumulative-overhead per worker-thread
> >>when
> >> employing more OS threads.
> >> However we are a bit clueless as to interpret the
> >>meaning
> >> of this cumulative-overhead counter.
> >> We were especially surprised to find, that the
> >> per-worker-thread overhead at some point came close to
> >>and
> >> even surpassed the total cumulative runtime (see
> >> cumulative overhead of worker thread 0 when run with 8
> >>os
> >> threads  vs. total cumulative runtime).
> >>
> >> What exactly does the performance counter
> >> /threads/time/cumulative-overhead measure? How can the
> >> overhead be larger than the total execution time?
> >> How could we narrow down the causes for the growing
> >> overhead? For example how could we measure how much time
> >> is spend waiting at (specific) mutexes  in total?
> >>
> >> Thanks in advance,
> >>
> >> Kilian Werner
> >>
> >>
> >>
> >> --hpx:threads 1:
> >>
> >> /threads{locality#0/total/total}/count/cumulative,1,2.015067,[s],127704
> >> /threads{locality#0/total/total}/time/average,1,2.015073,[s],14938,[ns]
> >>
> /threads{locality#0/total/total}/time/cumulative,1,2.015074,[s],1.90769e+0
> >> 9,[ns]
> >> /threads{locality#0/total/total}/time/cumulative-
> >> overhead,1,2.015076,[s],1.03483e+08,[ns]
> >> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative-
> >> overhead,1,2.015076,[s],1.03483e+08,[ns]
> >> /threads{locality#0/total/total}/idle-rate,1,2.015078,[s],514,[0.01%]
> >>
> >> --hpx:threads 2:
> >>
> >> /threads{locality#0/total/total}/count/cumulative,1,1.814639,[s],112250
> >> /threads{locality#0/total/total}/time/average,1,1.814644,[s],17986,[ns]
> >>
> /threads{locality#0/total/total}/time/cumulative,1,1.814654,[s],2.01907e+0
> >> 9,[ns]
> >> /threads{locality#0/total/total}/time/cumulative-
> >> overhead,1,1.814647,[s],1.60469e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative-
> >> overhead,1,1.814599,[s],1.12562e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#1}/time/cumulative-
> >> overhead,1,1.814649,[s],4.79071e+08,[ns]
> >> /threads{locality#0/total/total}/idle-rate,1,1.814603,[s],4428,[0.01%]
> >>
> >> --hpx:threads 8:
> >>
> >> /threads{locality#0/total/total}/count/cumulative,1,4.597361,[s],109476
> >> /threads{locality#0/total/total}/time/average,1,4.597373,[s],37988,[ns]
> >>
> /threads{locality#0/total/total}/time/cumulative,1,4.597335,[s],4.1588e+09
> >> ,[ns]
> >> /threads{locality#0/total/total}/time/cumulative-
> >> overhead,1,4.597325,[s],3.25232e+10,[ns]
> >> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative-
> >> overhead,1,4.597408,[s],4.20735e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#1}/time/cumulative-
> >> overhead,1,4.597390,[s],4.08787e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#2}/time/cumulative-
> >> overhead,1,4.597385,[s],3.62298e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#3}/time/cumulative-
> >> overhead,1,4.597358,[s],4.12475e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#4}/time/cumulative-
> >> overhead,1,4.597338,[s],4.10011e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#5}/time/cumulative-
> >> overhead,1,4.597402,[s],4.14242e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#6}/time/cumulative-
> >> overhead,1,4.597353,[s],4.13593e+09,[ns]
> >> /threads{locality#0/pool#default/worker-thread#7}/time/cumulative-
> >> overhead,1,4.597408,[s],4.13275e+09,[ns]
> >> /threads{locality#0/total/total}/idle-rate,1,4.597350,[s],8867,[0.01%]
> >>
> >>
> >> _______________________________________________
> >> hpx-users mailing list
> >> [email protected]
> >> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
> >
> > _______________________________________________
> > hpx-users mailing list
> > [email protected]
> > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] Performance Counter Data Interpretation

Reply via email to