On Wed, Mar 25, 2015 at 09:21:10PM -0400, Daniel Micay wrote:
> > I didn't follow this thread. However, as you mentioned MADV_FREE will
> > make many page fault, I jump into here.
> > One of the benefit with MADV_FREE in current implementation is to
> > avoid page fault as well as no zeroing.
> > Why did you see many page fault?
>
> I think I just misunderstood why it was still so much slower than not
> using purging at all.
>
> >> I get ~20k requests/s with jemalloc on the ebizzy benchmark with this
> >> dual core ivy bridge laptop. It jumps to ~60k requests/s with MADV_FREE
> >> IIRC, but disabling purging via MALLOC_CONF=lg_dirty_mult:-1 leads to
> >> 3.5 *million* requests/s. It has a similar impact with TCMalloc.
> >
> > When I tested MADV_FREE with ebizzy, I saw similar result two or three
> > times fater than MADV_DONTNEED. But It's no free cost. It incurs MADV_FREE
> > cost itself*(ie, enumerating all of page table in the range and clear
> > dirty bit and tlb flush). Of course, it has mmap_sem with read-side lock.
> > If you see great improve when you disable purging, I guess mainly it's
> > caused by no lock of mmap_sem so some threads can allocate while other
> > threads can do page fault. The reason I think so is I saw similar result
> > when I implemented vrange syscall which hold mmap_sem read-side lock
> > during very short time(ie, marking the volatile into vma, ie O(1) while
> > MADV_FREE holds a lock during enumerating all of pages in the range, ie
> > O(N))
>
> It stops doing mmap after getting warmed up since it never unmaps so I
> don't think mmap_sem is a contention issue. It could just be caused by
> the cost of the system call itself and TLB flush. I found perf to be
> fairly useless in identifying where the time was being spent.
>
> It might be much more important to purge very large ranges in one go
> with MADV_FREE. It's a different direction than the current compromises
> forced by MADV_DONTNEED.
>
I tested ebizzy + recent jemalloc in my KVM guest.
Apparently, no purging was best(ie, 4925 records/s) while purging with
MADV_DONTNEED was worst(ie, 1814 records/s).
However, in my machine, purging with MADV_FREE was not bad as yourr.
4338 records/s vs 4925 records/s.
Still, no purging was win but if we consider the num of madvise syscall
between no purging and MADV_FREE purging, it would be better than now.
0 vs 43724
One thing I am wondering is why the madvise syscall count is increased
when we turns on MADV_FREE compared to MADV_DONTNEED. It might be
aggressive dirty puring rule in jemalloc internal?
Anyway, my point is gap between MADV_FREE and no puring in my machine
is not much like you said.
********
#> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 1200.000
BogoMIPS: 6399.71
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-11
*****
ebizzy 0.2
(C) 2006-7 Intel Corporation
(C) 2007 Valerie Henson <[email protected]>
always_mmap 0
never_mmap 0
chunks 10
prevent coalescing using permissions 0
prevent coalescing using holes 0
random_size 0
chunk_size 5242880
seconds 10
threads 24
verbose 1
linear 0
touch_pages 0
page size 4096
Allocated memory
Wrote memory
Threads starting
Threads finished
******
jemalloc git head
commit 65db63cf3f0c5dd5126a1b3786756486eaf931ba
Author: Jason Evans <[email protected]>
Date: Wed Mar 25 18:56:55 2015 -0700
Fix in-place shrinking huge reallocation purging bugs.
******
1) LD_PRELOAD="/jemalloc/lib/libjemalloc.so.dontneed" strace -c -f ./ebizzy -s
$((5<<20))
1814 records/s
real 10.00 s
user 28.18 s
sys 90.08 s
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.78 99.368420 5469 18171 madvise
9.14 10.001131 10001131 1 nanosleep
0.05 0.050037 807 62 10 futex
0.03 0.031721 291 109 mmap
0.00 0.004455 178 25 set_robust_list
0.00 0.000129 5 24 clone
0.00 0.000000 0 4 read
0.00 0.000000 0 1 write
0.00 0.000000 0 6 open
0.00 0.000000 0 6 close
0.00 0.000000 0 6 fstat
0.00 0.000000 0 32 mprotect
0.00 0.000000 0 35 munmap
0.00 0.000000 0 2 brk
0.00 0.000000 0 3 rt_sigaction
0.00 0.000000 0 3 rt_sigprocmask
0.00 0.000000 0 4 3 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 1 readlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 2 getrusage
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
------ ----------- ----------- --------- --------- ----------------
100.00 109.455893 18501 14 total
2) LD_PRELOAD="/jemalloc/lib/libjemalloc.so.dontneed"
MALLOC_CONF=lg_dirty_mult:-1 strace -c -f ./ebizzy -s $((5<<20))
4925 records/s
real 10.00 s
user 119.83 s
sys 0.16 s
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
82.73 0.821804 15804 52 6 futex
15.70 0.156000 156000 1 nanosleep
1.53 0.015186 115 132 mmap
0.04 0.000349 4 87 munmap
0.00 0.000000 0 4 read
0.00 0.000000 0 1 write
0.00 0.000000 0 6 open
0.00 0.000000 0 6 close
0.00 0.000000 0 6 fstat
0.00 0.000000 0 32 mprotect
0.00 0.000000 0 2 brk
0.00 0.000000 0 3 rt_sigaction
0.00 0.000000 0 3 rt_sigprocmask
0.00 0.000000 0 4 3 access
0.00 0.000000 0 24 madvise
0.00 0.000000 0 24 clone
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 1 readlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 2 getrusage
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 25 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.993339 419 10 total
3) LD_PRELOAD="/jemalloc/lib/libjemalloc.so.free" strace -c -f ./ebizzy -s
$((5<<20))
4338 records/s
real 10.00 s
user 91.40 s
sys 12.58 s
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
78.39 36.433483 839 43408 madvise
21.53 10.004889 10004889 1 nanosleep
0.04 0.020472 394 52 15 futex
0.03 0.015464 145 107 mmap
0.00 0.000041 2 24 clone
0.00 0.000000 0 4 read
0.00 0.000000 0 1 write
0.00 0.000000 0 6 open
0.00 0.000000 0 6 close
0.00 0.000000 0 6 fstat
0.00 0.000000 0 32 mprotect
0.00 0.000000 0 33 munmap
0.00 0.000000 0 2 brk
0.00 0.000000 0 3 rt_sigaction
0.00 0.000000 0 3 rt_sigprocmask
0.00 0.000000 0 4 3 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 1 readlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 2 getrusage
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 25 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 46.474349 43724 19 total
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html