Thanks for the ideas. I did not get much info from running boehm. When I have 
more time, I'll try markandsweep, and also tlsemulation.

> It's often best to limit the number of threads to the number of cores on a 
> processor...

I don't know how to do that. Nim's threadpool does not offer a way. The 
previous solution used Python/multiprocessing + Cython. You can argue that it 
avoids NUMA problems via multiprocessing, but it did not suffer memory growth. 
Possibly calloc/free uses mmap/munmap for large blocks.

Interestingly, my little Nim example on OSX sped up after total memory usage 
quickly stabilized. So maybe it's better optimized for large, repeated 
alloc/free.

But I don't know whether the Linux slow-down is from the large allocs, the many 
small allocs, or basic processing.

> If you're allocating lots of very large blocks of memory, fragmentation is 
> going to hurt you sooner or later. The only solution for that would be a 
> compacting garbage collector. I'm not sure what you're allocating in your 
> actual code.

Yeah, that's what I think is going on. I would love to be able to view the 
freelist lengths when some flag is set, so I wouldn't have to wonder. Would 
such a change likely be accepted in GitHub?

Reply via email to