Am Donnerstag 18 März 2010 22:44:55 schrieb Simon Marlow: > On 17/03/10 21:30, Daniel Fischer wrote: > > Am Mittwoch 17 März 2010 19:49:57 schrieb Artyom Kazak: > >> Hello! > >> I tried to implement the parallel Monte-Carlo method of computing Pi > >> number, using two cores: > > > > <move> > > > >> But it uses only on core: > > > > <snip> > > > >> We see that our one spark is pruned. Why? > > > > Well, the problem is that your tasks don't do any real work - yet. > > piMonte returns a thunk pretty immediately, that thunk is then > > evaluated by show, long after your chance for parallelism is gone. You > > must force the work to be done _in_ r1 and r2, then you get > > parallelism: > > > > Generation 0: 2627 collections, 2626 parallel, 0.14s, 0.12s > > elapsed Generation 1: 1 collections, 1 parallel, 0.00s, > > 0.00s elapsed > > > > Parallel GC work balance: 1.79 (429262 / 240225, ideal 2) > > > > MUT time (elapsed) GC time (elapsed) > > Task 0 (worker) : 0.00s ( 8.22s) 0.00s ( 0.00s) > > Task 1 (worker) : 8.16s ( 8.22s) 0.01s ( 0.01s) > > Task 2 (worker) : 8.00s ( 8.22s) 0.13s ( 0.11s) > > Task 3 (worker) : 0.00s ( 8.22s) 0.00s ( 0.00s) > > > > SPARKS: 1 (1 converted, 0 pruned) > > > > INIT time 0.00s ( 0.00s elapsed) > > MUT time 16.14s ( 8.22s elapsed) > > GC time 0.14s ( 0.12s elapsed) > > EXIT time 0.00s ( 0.00s elapsed) > > Total time 16.29s ( 8.34s elapsed) > > > > %GC time 0.9% (1.4% elapsed) > > > > Alloc rate 163,684,377 bytes per MUT second > > > > Productivity 99.1% of total user, 193.5% of total elapsed > > > > But alas, it is slower than the single-threaded calculation :( > > > > INIT time 0.00s ( 0.00s elapsed) > > MUT time 7.08s ( 7.10s elapsed) > > GC time 0.08s ( 0.08s elapsed) > > EXIT time 0.00s ( 0.00s elapsed) > > Total time 7.15s ( 7.18s elapsed) > > It works for me (GHC 6.12.1): > > SPARKS: 1 (1 converted, 0 pruned) > > INIT time 0.00s ( 0.00s elapsed) > MUT time 9.05s ( 4.54s elapsed) > GC time 0.12s ( 0.09s elapsed) > EXIT time 0.00s ( 0.01s elapsed) > Total time 9.12s ( 4.63s elapsed) > > wall-clock speedup of 1.93 on 2 cores.
Is that Artyom's original code or with the pseq'ed length? The original didn't convert any sparks for me (~103% cpu, because of parallel GC, but the calculation always used just one thread). I'm also using 6.12.1. And, with -N2, I also have a productivity of 193.5%, but the elapsed time is larger than the elapsed time for -N1. How long does it take with -N1 for you? It's the same with 6.10.3, no converted sparks for the original code, parallelism with the pseq'ed length, but it takes longer than with -N1. > > What hardware are you using there? 3.06GHz Pentium 4, 2 cores. I have mixed results with parallelism, some programmes get a speed-up of nearly a factor 2 (wall-clock time), others 1.4, 1.5 or so, yet others take about the same wall-clock time as the single threaded programme, some - like this - take longer despite using both cores intensively. > Have you tried changing any GC settings? I've played around a little with -qg and -qb and -C, but that showed little influence. Any tips what else might be worth a try? > > Cheers, > Simon _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe