On 03/21/2011 10:52 PM, Benjamin Herrenschmidt wrote: > On Mon, 2011-03-21 at 11:24 +0000, Jeremy Fitzhardinge wrote: >> I'm very sorry about that, I didn't realize power was also using that >> interface. Unfortunately, the "no preemption" definition was an error, >> and had to be changed to match the pre-existing locking rules. >> >> Could you implement a similar "flush batched pte updates on context >> switch" as x86? > Well, we already do that for -rt & co. > > However, we have another issue which is the reason we used those > lazy_mmu hooks to do our flushing. > > Our PTEs eventually get faulted into a hash table which is what the real > MMU uses. We must never (ever) allow that hash table to contain a > duplicate entry for a given virtual address. > > When we do a batch, we remove things from the linux PTE, and keep a > reference in our batch structure, and only update the hash table at the > end of the batch.
Wouldn't implicitly ending a batch on context switch get the same effect? > That means that we must not allow a hash fault to populate the hash with > a "new" PTE value prior to the old one having been flushed out (which is > possible if they different in protection attributes for example). For > that to happen, we must basically not allow a page fault to re-populate > a PTE invalidated by a batch before that batch has completed. Kernel ptes are not generally populated on fault though, unless there's something in power? On x86 it can happen when syncing a process's kernel pmd with the init_mm one, but that shouldn't happen in the middle of an update since you'd deadlock anyway. If a particular kernel subsystem has its own locks to manage the ptes for a kernel mapping, then that should prevent any nested updates within a batch shouldn't it? > That translates to batches must only happen within a PTE lock section. Well, in that case, I guess your best bet is to disable batching for kernel pagetable updates. These apply_to_page_range() changes are the first time any attempt to batch kernel pagetable updates has been made (otherwise you would have seen this problem earlier), so not batching them will not be a regression for you. But I'm not sure what the proper fix to get batching in your case will be. But the assumption that there's a pte lock for kernel ptes is not valid. J _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev