Thanks very much for detailed info and various links! I did not even know that 
OpenMP is supported via `||` (though with some restriction), so I will play 
around with it and also learn Weave (which seems more flexible and efficient).

And, I have several more questions...

1) Based on the above info, there are (roughly speaking) three different 
parallel approaches in Nim; is this understanding correct...?

  * OpenMP (via `||`)
  * Weave
  * Threads, channels, etc, as described in the Nim manual and "Nim in Action", 
Chap.6 (and also experimental parallel statement: 
[https://nim-lang.org/docs/manual_experimental.html#parallel-amp-spawn-parallel-statement](https://nim-lang.org/docs/manual_experimental.html#parallel-amp-spawn-parallel-statement))



2) In both OpenMP and Weave, is it essential to first cast `seq` to 
`UncheckedArray` and use the latter in a parallel region (e.g., inside for-loop 
with `||`, or a block within `init(Weave)` ... `exit(Weave)`)? In that case, is 
it also possible to cast `Tensor` (in Arraymancer) to `UncheckedArray` by 
getting a raw (unsafe) data pointer somehow?

3) In the example code of Weave for matrix transpose, `captures` are used for 
some variables:
    
    
    parallelFor j in 0 ..< N:
        captures: {M, N, bufIn, bufOut}
        parallelFor i in 0 ..< M:
          captures: {j, M, N, bufIn, bufOut}
    
    
    Run

Here, is the meaning of `captures` similar to `shared` in OpenMP (roughly 
speaking)...?

4) I have installed Weave-0.4.0 (+ Nim-1.2.2) and tried the matrix transpose 
code shown in the Github page. Here, I also added the below code at the end of 
main() to print some array elements:
    
    
    # In proc main():
      ...
      init(Weave)
      transpose(M, N, bufIn, bufOut)
      exit(Weave)
      
      # Show some elements.
      echo("input  [2 * N + 5] = ", input[2 * N + 5], " (= ", bufIn[2 * N + 5], 
")")
      echo("input  [5 * N + 2] = ", input[5 * N + 2], " (= ", bufIn[5 * N + 2], 
")")
      echo()
      echo("output [2 * M + 5] = ", output[2 * M + 5], " (= ", bufOut[2 * M + 
5], ")")
      echo("output [5 * M + 2] = ", output[5 * M + 2], " (= ", bufOut[5 * M + 
2], ")")
    
    
    Run

Compiling as `nim c --threads:on test.nim` gives the expected result:
    
    
    input  [2 * N + 5] = 4005.0 (= 4005.0)
    input  [5 * N + 2] = 10002.0 (= 10002.0)
    
    output [2 * M + 5] = 10002.0 (= 10002.0)
    output [5 * M + 2] = 4005.0 (= 4005.0)
    
    
    Run

On the other hand, if I moved `exit(Weave)` after all the above `echo` 
statements, the result changes to 
    
    
    input  [2 * N + 5] = 4005.0 (= 4005.0)
    input  [5 * N + 2] = 10002.0 (= 10002.0)
    
    output [2 * M + 5] = 0.0 (= 0.0)
    output [5 * M + 2] = 0.0 (= 0.0)
    
    
    Run

Does this mean that `exit(Weave)` has the role of some "synchronization"(?) for 
parallel calculations, and so mandatory before accessing any `UncheckedArray` 
used in the `parallelFor` regions?

5) Again, in the matrix transpose code above, the `input` (of type `seq`) is 
cast to `bufIn` (of type `UncheckedArray`) by using the address obtained from 
`input[0].unsafeAddr`. On the other hand, `.addr` is used to cast `output` to 
`bufOut`. Is this difference important, or is it actually OK whichever of 
`.unsafeAddr` or `.addr` is used?
    
    
    let input = newSeq[float32](M * N)
    let bufIn = cast[ptr UncheckedArray[float32]]( input[0].unsafeAddr )
    ...
    var output = newSeq[float32](N * M)
    let bufOut = cast[ptr UncheckedArray[float32]]( output[0].addr )
    
    
    Run

I am sorry again for many questions! These are not urgent at all (I'm still 
learning more basic syntax...), but I would appreciate any hints and inputs 
again. Thanks very much :)

PS. "seems like you love tea ;)" Yes, I recently like to drink Rooibos tea 
(particularly at night), though I like coffee in the morning :)

Reply via email to