Thank you! I made some test with the `parMap` template of the malebolgia library (trying different `bulkSize` and `ThreadPoolSize` values). It works but slower than the unparallelized version. It seems to me that my function is just too cheap to be parallelized. I think it is consistent with what is said in the _weave_ github page:
> Unfortunately existing framework requires computation to take 10000 cycles at > minimum (Intel TBB) which corresponds to 3.33 µs on a 3 GHz CPU to amortize > the cost of scheduling.