Hi Bibek --
The following code:
> forall i in arow do
> forall j in bcol do
> forall k in acol do
> c(i,j) += a(i,k)*b(k,j);
is problematic in a few ways: Because there are multiple tasks executing
for the same (i,j) indices, you could have a read-read-write-write race on
c(i,j). In particular, say that you're the task executing (i=1, j=1, k=1)
and I'm the task executing (i=1, j=1, k=2). If we hit the inner loop at
approximately the same time, we may both read the same value of c(i,j),
each do our accumulation into our local copy of that read, and then each
write our results back to c(i,j), in which case one of our two writes
would get lost. This suggests you either shouldn't parallelize the k
loop, or that you should use some sort of synchronization to ensure that
this race doesn't occur (e.g., sync variables, atomics, or reductions).
Correcting something you wrote:
> 2nd is a parallel code .. it creates i*j*k number of thread internally,
If by i*j*k threads, you mean |arow| * |bcol| * |acol| then this isn't
correct -- forall loops generally create a number of tasks proportional
to the machine's hardware parallelism, not to the number of iterations
in the loop. E.g., running:
forall i in 1..1000000
will create here.numCores() tasks (a task per processor core), not
1,000,000 tasks. For more information on how tasks are created for forall
loops by default, refer to "Controlling Degree of Data Parallelism" in
README.executing in the doc/ directory of the release.
-Brad
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users