Thanks very much for detailed info and various links! I did not even know that OpenMP is supported via `||` (though with some restriction), so I will play around with it and also learn Weave (which seems more flexible and efficient).
And, I have several more questions... 1) Based on the above info, there are (roughly speaking) three different parallel approaches in Nim; is this understanding correct...? * OpenMP (via `||`) * Weave * Threads, channels, etc, as described in the Nim manual and "Nim in Action", Chap.6 (and also experimental parallel statement: [https://nim-lang.org/docs/manual_experimental.html#parallel-amp-spawn-parallel-statement](https://nim-lang.org/docs/manual_experimental.html#parallel-amp-spawn-parallel-statement)) 2) In both OpenMP and Weave, is it essential to first cast `seq` to `UncheckedArray` and use the latter in a parallel region (e.g., inside for-loop with `||`, or a block within `init(Weave)` ... `exit(Weave)`)? In that case, is it also possible to cast `Tensor` (in Arraymancer) to `UncheckedArray` by getting a raw (unsafe) data pointer somehow? 3) In the example code of Weave for matrix transpose, `captures` are used for some variables: parallelFor j in 0 ..< N: captures: {M, N, bufIn, bufOut} parallelFor i in 0 ..< M: captures: {j, M, N, bufIn, bufOut} Run Here, is the meaning of `captures` similar to `shared` in OpenMP (roughly speaking)...? 4) I have installed Weave-0.4.0 (+ Nim-1.2.2) and tried the matrix transpose code shown in the Github page. Here, I also added the below code at the end of main() to print some array elements: # In proc main(): ... init(Weave) transpose(M, N, bufIn, bufOut) exit(Weave) # Show some elements. echo("input [2 * N + 5] = ", input[2 * N + 5], " (= ", bufIn[2 * N + 5], ")") echo("input [5 * N + 2] = ", input[5 * N + 2], " (= ", bufIn[5 * N + 2], ")") echo() echo("output [2 * M + 5] = ", output[2 * M + 5], " (= ", bufOut[2 * M + 5], ")") echo("output [5 * M + 2] = ", output[5 * M + 2], " (= ", bufOut[5 * M + 2], ")") Run Compiling as `nim c --threads:on test.nim` gives the expected result: input [2 * N + 5] = 4005.0 (= 4005.0) input [5 * N + 2] = 10002.0 (= 10002.0) output [2 * M + 5] = 10002.0 (= 10002.0) output [5 * M + 2] = 4005.0 (= 4005.0) Run On the other hand, if I moved `exit(Weave)` after all the above `echo` statements, the result changes to input [2 * N + 5] = 4005.0 (= 4005.0) input [5 * N + 2] = 10002.0 (= 10002.0) output [2 * M + 5] = 0.0 (= 0.0) output [5 * M + 2] = 0.0 (= 0.0) Run Does this mean that `exit(Weave)` has the role of some "synchronization"(?) for parallel calculations, and so mandatory before accessing any `UncheckedArray` used in the `parallelFor` regions? 5) Again, in the matrix transpose code above, the `input` (of type `seq`) is cast to `bufIn` (of type `UncheckedArray`) by using the address obtained from `input[0].unsafeAddr`. On the other hand, `.addr` is used to cast `output` to `bufOut`. Is this difference important, or is it actually OK whichever of `.unsafeAddr` or `.addr` is used? let input = newSeq[float32](M * N) let bufIn = cast[ptr UncheckedArray[float32]]( input[0].unsafeAddr ) ... var output = newSeq[float32](N * M) let bufOut = cast[ptr UncheckedArray[float32]]( output[0].addr ) Run I am sorry again for many questions! These are not urgent at all (I'm still learning more basic syntax...), but I would appreciate any hints and inputs again. Thanks very much :) PS. "seems like you love tea ;)" Yes, I recently like to drink Rooibos tea (particularly at night), though I like coffee in the morning :)
