I have tried to make example codes for employing _malebolgia_ and _taskpools_ (for later replacement with my specific functions).
I got extremely slow codes. I think is my fault, and it is because the computational cost of the test funcion is negligible compared to the cost of creating pools/threads or something like that. My function of course is not as simple as the one in my test, but it is still cheap (0.1 to 1 microsecond), so I think I am following the wrong approach. My test (trying to execute this 1 millons times) are: import malebolgia proc f(i: int): int {.gcsafe.}= return(i*i) var store = newSeq[int](10) var m = createMaster() for j in 0..<1_000_000: m.awaitAll: for i in 0 ..< 10: m.spawn f(i) -> store[i] echo store Run and import std/cpuinfo, taskpools proc f(i: int): int = return i * i proc main() = let n = 1_000_000 let nthreads = countProcessors() var tp = Taskpool.new(num_threads = nthreads) var store = newSeq[int](10) for j in 0..<n: var pendingFuts = newSeq[FlowVar[int]](10) for i in 0 ..< 10: pendingFuts[i] = tp.spawn f(i) for i in 0 ..< 10: store[i] = sync pendingFuts[i] tp.syncAll() tp.shutdown() echo store main() Run How can I make it faster than running sequentially in one core?