I'll be happy to run the benchmark. Can you give me the details on how to actually run it?
Regards, Elias On 17 April 2014 01:56, Juergen Sauermann <[email protected]>wrote: > Hi, > > I have created a benchmark program that measures the startup (fork) and > finish (join) > times of OMP. It also compares them with a hand-crafted fork/join. > > The manual implementation uses a O(log(P)) algorithm for forking and > joining compared to > apparently an assumed O(P) algorithm in OMP. It would therefore be very > interesting if > Elias could run it on his 80-core machine. For my dual-core the difference > between both > types of algorithm should be minor. > > The first run of both algorithms seemed to suggest hand-crafted version is > much faster > than OMP: > > Pass 0: 2 cores/threads, 15330 cycles total (hand-crafted) > > Pass 0: 2 cores/threads, 99197 cycles total (OMP) > > > But then came a surprise when I ran the benchmark loop several times in a > row: > > ./Parallel 2 (hand-crafted) > Pass 0: 2 cores/threads, 17542 cycles total > Pass 1: 2 cores/threads, 21070 cycles total > Pass 2: 2 cores/threads, 19075 cycles total > Pass 3: 2 cores/threads, 18249 cycles total > Pass 4: 2 cores/threads, 16415 cycles total > > ./Parallel_OMP 2 (OMP) > Pass 0: 2 cores/threads, 1213632 cycles total > Pass 1: 2 cores/threads, 5831 cycles total > Pass 2: 2 cores/threads, 2434215 cycles total > Pass 3: 2 cores/threads, 5705 cycles total > Pass 4: 2 cores/threads, 5215 cycles total > > The details in the OMP case reveal that most of the time is spent on fork > (which is different from Elias' earlier results where join took the most > time. > Look a little like code-loading (shared lib?) might be the issue for OMP. > > /// Jürgen > > > >
