On Mon, 19 Nov 2018 01:06:45 -0000 (UTC) mlel...@serpens.de (Michael van Elst) wrote:
> munlock fails when not the whole range has been locked, Since the > range is rounded to page boundaries, there could be some overlap. > Are you referring to virtual or physical range of addresses? As far as I remember all memory ranges were power of 2 and much greater than 4 KiB. Maybe memory alignment has to be on page boundary, I'll see if it helps changing malloc to posix_memalign. > Another effect on your system is NUMA. Linux will allocate memory > on the CPU that requests it when possible. NetBSD has no idea about > NUMA. On your system that can easily have a 20-30% impact on memcpy > speed. > > If a thread sleeps, it either is doing a system call, or the scheduler > doesn't allocate a CPU for it. The latter shouldn't happen in netbsd-8 > for CPU bound user threads. > > But without seeing your code, it's difficult to tell what happens. Speed difference is about 2.5 times, so way bigger than 30% you mentioned. Also, there is a simple loop that calls memcpy, no syscalls of any kind, but for some reason threads are idle 60% of the time. I'll run some more tests and provide more details.