cryintotheblue...@gmail.com (Sad Clouds) writes: >Looked at disassembly of memcpy() and NetBSD version looks way more >complicated. I don't know anything about x86 assembly, but maybe the >clue is somewhere here:
The Linux code shown is incomplete. But that can't be relevant to your problem. munlock fails when not the whole range has been locked, Since the range is rounded to page boundaries, there could be some overlap. The memcpy speed is obviously influenced by the caches. Multiple threads can easily cause trashing and the memory allocator may make a difference. Another effect on your system is NUMA. Linux will allocate memory on the CPU that requests it when possible. NetBSD has no idea about NUMA. On your system that can easily have a 20-30% impact on memcpy speed. If a thread sleeps, it either is doing a system call, or the scheduler doesn't allocate a CPU for it. The latter shouldn't happen in netbsd-8 for CPU bound user threads. But without seeing your code, it's difficult to tell what happens. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."