From: [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 17:11:32 -0800 > Before: > > mov %gs:0x8,%rdx Get smp_processor_id > mov tableoffset,%rax Get table base > incq varoffset(%rax,%rdx,1) Perform the operation with a complex lookup > adding the var offset > > An interrupt or a reschedule action can move the execution thread to another > processor if interrupt or preempt is not disabled. Then the variable of > the wrong processor may be updated in a racy way. > > After: > > incq %gs:varoffset(%rip) > > Single instruction that is safe from interrupts or moving of the execution > thread. It will reliably operate on the current processors data area. > > Other platforms can also perform address relocation plus atomic ops on > a memory location. Exploiting of the atomicity of instructions vs interrupts > is therefore possible and will reduce the cpu op processing overhead. > > F.e on IA64 we have per cpu virtual mapping of the per cpu area. If > we add an offset to the per cpu area variable address then we can guarantee > that we always hit the per cpu areas local to a processor. > > Other platforms (SPARC?) have registers that can be used to form addresses. > If the cpu area address is in one of those then atomic per cpu modifications > can be generated for those platforms in the same way.
Although we have a per-cpu area base in a fixed global register for addressing, the above isn't beneficial on sparc64 because the atomic is much slower than doing a: local_irq_disable(); nonatomic_percpu_memory_op(); local_irq_enable(); local_irq_{disable,enable}() together is about 18 cycles. Just the cmpxchg() part of the atomic sequence is at least 32 cycles and requires a loop: while (1) { x = ld(); if (cmpxchg(x, op(x))) break; } which bloats up the atomic version even more. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/