The documentation is out of date. pmap_kenter() synchronizes the TLB on all CPUs.
pmap_kenter_quick() only invalidates the TLB on the current cpu. pmap_kenter_noinval() does not touch the TLB at all. The latter two functions are only used under controlled circumstances. pmap_kenter_quick() is used when the caller only intends to use the mapping on the current cpu, or intends to track which cpu's have synchronized the mapping. And pmap_kenter_noinval() is used when the caller is entering a ton of PTEs into the pmap and will do a global invalidation when done. pmap_qenter() passes an argument w/regard to what kind of tlb invalidation is desired. I should note that the buffer cache code (struct buf) tracks mapping validity with a cpu mask, so operations on the buffer cache do not cause an excessive number of global IPIs for SMP invalidations. -Matt