I've narrowed the problem down to a single line of code: `calloc()` on 2MB - 
20MB, many times. I guess the standard malloc/calloc uses a locking call for 
large allocations.

So there are probably several solutions:

  1. Re-write the C code in Nim, which uses thread-local allocation by default.
  2. Use a different memory-manager than the Linux/gcc default.
  3. multiprocessing (starting with Araq's suggestion)



This was probably not a case of "False Sharing". Still, I don't quite 
understand why the same code is fast in the main thread. Wouldn't the large 
allocation still use a lock?

Reply via email to