On 06/19/2015 04:33 PM, Andi Kleen wrote: >> > I *think* we can avoid taking the srcu_read_lock() for the >> > common case where there are no actual marks on the file >> > being modified *or* the vfsmount. > What is so expensive in it? Just the memory barrier in it?
The profiling doesn't hit on the mfence directly, but I assume that the overhead is coming from there. The "mov 0x8(%rdi),%rcx" is identical before and after the barrier, but it appears much more expensive _after_. That makes no sense unless the barrier is the thing causing it. Here's how the annotation mode of 'perf top' breaks it down: > │ ffffffff810fb480 <load0>: > │ nop > │ mov (%rdi),%rax > 0.58 │ push %rbp > │ incl %gs:0x7ef0f488(%rip) > 1.73 │ mov %rsp,%rbp > │ and $0x1,%eax > │ movslq %eax,%rdx > 0.58 │ mov 0x8(%rdi),%rcx > │ incq %gs:(%rcx,%rdx,8) > │ mfence > 69.94 │ add $0x2,%rdx > 7.51 │ mov 0x8(%rdi),%rcx > 4.05 │ incq %gs:(%rcx,%rdx,8) > 13.87 │ decl %gs:0x7ef0f45f(%rip) > │ pop %rbp > 1.73 │ ← retq > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/