I mentioned the NUMA problems because you said you were experiencing poor performance running on 48 cores, which might be caused by memory locality issues. There's little you can do within Nim (other than through OS-specific system calls). You probably want to limit execution to just the cores of a single processor with `numactl` and check if you still experience the slowdown.
What is puzzling about the memory situation in general is that your free space seems to explode. It wouldn't be surprising if total heap space (live objects + free space) went up as a result of fragmentation, but free space explosion would mean that you've encountered a pathological case. Conservative scanning is unlikely to be the cause, because the amount of wasted memory is generally bounded. This is why I was suggesting to look at `--gc:boehm` and `--gc:markandsweep`: not because they might perform better, but to narrow down the diagnosis.
