Dear Prof. Kaiser, it is a new application we are building with hpx used for parallelization (and a later distribution in mind). We have been working on it for the last two months and the speedups have been this bad right from the start. However since the obvious, inherent bottlenecks were dealt with one after the other we went to performance counters for more detailed profiling.
Thanks, Kilian Werner On Tue, 9 Jan 2018 08:47:12 -0600 "Hartmut Kaiser" <[email protected]> wrote: > Kilian, > > Was this slowdown happening always or did it just >started to be bad > recently? > > Thanks! > Regards Hartmut > --------------- > http://boost-spirit.com > http://stellar.cct.lsu.edu > > >> -----Original Message----- >> From: [email protected] >>[mailto:hpx-users- >> [email protected]] On Behalf Of Kilian Werner >> Sent: Tuesday, January 9, 2018 7:46 AM >> To: [email protected] >> Subject: [hpx-users] Performance Counter Data >>Interpretation >> >> Dear hpx user list, >> >> one of our projects shows unexpectedly bad speedups when >> supplying additional OS-worker-threads to HPX. >> The project is run locally and in parallel on a machine >> with 8 cores, trying to pin down the parallelization >> bottleneck we printed the built in HPX Performance >> Counters as seen below. >> The parallelization is achieved by scheduling tasks with >> hpx::apply that themselves will schedule additional >>tasks >> with hpx::apply. >> The program terminates after a final task (that can >> identify itself and will always finish last, independent >> of task scheduling order) fires an event. >> Synchronization is performed with some >> hpx::lcos::local::mutex locks. >> >> The problem seems to be apparent when looking at the >> harshly growing cumulative-overhead per worker-thread >>when >> employing more OS threads. >> However we are a bit clueless as to interpret the >>meaning >> of this cumulative-overhead counter. >> We were especially surprised to find, that the >> per-worker-thread overhead at some point came close to >>and >> even surpassed the total cumulative runtime (see >> cumulative overhead of worker thread 0 when run with 8 >>os >> threads vs. total cumulative runtime). >> >> What exactly does the performance counter >> /threads/time/cumulative-overhead measure? How can the >> overhead be larger than the total execution time? >> How could we narrow down the causes for the growing >> overhead? For example how could we measure how much time >> is spend waiting at (specific) mutexes in total? >> >> Thanks in advance, >> >> Kilian Werner >> >> >> >> --hpx:threads 1: >> >> /threads{locality#0/total/total}/count/cumulative,1,2.015067,[s],127704 >> /threads{locality#0/total/total}/time/average,1,2.015073,[s],14938,[ns] >> /threads{locality#0/total/total}/time/cumulative,1,2.015074,[s],1.90769e+0 >> 9,[ns] >> /threads{locality#0/total/total}/time/cumulative- >> overhead,1,2.015076,[s],1.03483e+08,[ns] >> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative- >> overhead,1,2.015076,[s],1.03483e+08,[ns] >> /threads{locality#0/total/total}/idle-rate,1,2.015078,[s],514,[0.01%] >> >> --hpx:threads 2: >> >> /threads{locality#0/total/total}/count/cumulative,1,1.814639,[s],112250 >> /threads{locality#0/total/total}/time/average,1,1.814644,[s],17986,[ns] >> /threads{locality#0/total/total}/time/cumulative,1,1.814654,[s],2.01907e+0 >> 9,[ns] >> /threads{locality#0/total/total}/time/cumulative- >> overhead,1,1.814647,[s],1.60469e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative- >> overhead,1,1.814599,[s],1.12562e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#1}/time/cumulative- >> overhead,1,1.814649,[s],4.79071e+08,[ns] >> /threads{locality#0/total/total}/idle-rate,1,1.814603,[s],4428,[0.01%] >> >> --hpx:threads 8: >> >> /threads{locality#0/total/total}/count/cumulative,1,4.597361,[s],109476 >> /threads{locality#0/total/total}/time/average,1,4.597373,[s],37988,[ns] >> /threads{locality#0/total/total}/time/cumulative,1,4.597335,[s],4.1588e+09 >> ,[ns] >> /threads{locality#0/total/total}/time/cumulative- >> overhead,1,4.597325,[s],3.25232e+10,[ns] >> /threads{locality#0/pool#default/worker-thread#0}/time/cumulative- >> overhead,1,4.597408,[s],4.20735e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#1}/time/cumulative- >> overhead,1,4.597390,[s],4.08787e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#2}/time/cumulative- >> overhead,1,4.597385,[s],3.62298e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#3}/time/cumulative- >> overhead,1,4.597358,[s],4.12475e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#4}/time/cumulative- >> overhead,1,4.597338,[s],4.10011e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#5}/time/cumulative- >> overhead,1,4.597402,[s],4.14242e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#6}/time/cumulative- >> overhead,1,4.597353,[s],4.13593e+09,[ns] >> /threads{locality#0/pool#default/worker-thread#7}/time/cumulative- >> overhead,1,4.597408,[s],4.13275e+09,[ns] >> /threads{locality#0/total/total}/idle-rate,1,4.597350,[s],8867,[0.01%] >> >> >> _______________________________________________ >> hpx-users mailing list >> [email protected] >> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users > > _______________________________________________ > hpx-users mailing list > [email protected] > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users _______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
