I'll be happy to run the benchmark. Can you give me the details on how to
actually run it?

Regards,
Elias


On 17 April 2014 01:56, Juergen Sauermann <[email protected]>wrote:

> Hi,
>
> I have created a benchmark program that measures the startup (fork) and
> finish (join)
> times of OMP. It also compares them with a hand-crafted fork/join.
>
> The manual implementation uses a O(log(P)) algorithm for forking and
> joining compared to
> apparently an assumed O(P) algorithm in OMP. It would therefore be very
> interesting if
> Elias could run it on his 80-core machine. For my dual-core the difference
> between both
> types of algorithm should be minor.
>
> The first run of both algorithms seemed to suggest hand-crafted version is
> much faster
> than OMP:
>
> Pass 0: 2 cores/threads, 15330 cycles total (hand-crafted)
>
> Pass 0: 2 cores/threads, 99197 cycles total (OMP)
>
>
> But then came a surprise when I ran the benchmark loop several times in a
> row:
>
> ./Parallel 2 (hand-crafted)
> Pass 0: 2 cores/threads, 17542 cycles total
> Pass 1: 2 cores/threads, 21070 cycles total
> Pass 2: 2 cores/threads, 19075 cycles total
> Pass 3: 2 cores/threads, 18249 cycles total
> Pass 4: 2 cores/threads, 16415 cycles total
>
> ./Parallel_OMP 2 (OMP)
> Pass 0: 2 cores/threads, 1213632 cycles total
> Pass 1: 2 cores/threads, 5831 cycles total
> Pass 2: 2 cores/threads, 2434215 cycles total
> Pass 3: 2 cores/threads, 5705 cycles total
> Pass 4: 2 cores/threads, 5215 cycles total
>
> The details in the OMP case reveal that most of the time is spent on fork
> (which is different from Elias' earlier results where join took the most
> time.
> Look a little like code-loading (shared lib?) might be the issue for OMP.
>
> /// Jürgen
>
>
>
>

Reply via email to