Andrew, Could we get direction/guidance from you as regards the invalidate_page() callout of Andrea's patch set versus the invalidate_range_start/invalidate_range_end callout pairs of Christoph's patchset? This is only in the context of the __xip_unmap, do_wp_page, page_mkclean_one, and try_to_unmap_one call sites.
On Tue, Apr 22, 2008 at 03:48:47PM +0200, Andrea Arcangeli wrote: > On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote: > > I am a little confused about the value of the seq_lock versus a simple > > atomic, but I assumed there is a reason and left it at that. > > There's no value for anything but get_user_pages (get_user_pages takes > its own lock internally though). I preferred to explain it as a > seqlock because it was simpler for reading, but I totally agree in the > final implementation it shouldn't be a seqlock. My code was meant to > be pseudo-code only. It doesn't even need to be atomic ;). Unless there is additional locking in your fault path, I think it does need to be atomic. > > I don't know what you mean by "it'd" run slower and what you mean by > > "armed and disarmed". > > 1) when armed the time-window where the kvm-page-fault would be > blocked would be a bit larger without invalidate_page for no good > reason But that is a distinction without a difference. In the _start/_end case, kvm's fault handler will not have any _DIRECT_ blocking, but get_user_pages() had certainly better block waiting for some other lock to prevent the process's pages being refaulted. I am no VM expert, but that seems like it is critical to having a consistent virtual address space. Effectively, you have a delay on the kvm fault handler beginning when either invalidate_page() is entered or invalidate_range_start() is entered until when the _CALLER_ of the invalidate* method has unlocked. That time will remain essentailly identical for either case. I would argue you would be hard pressed to even measure the difference. > 2) if you were to remove invalidate_page when disarmed the VM could > would need two branches instead of one in various places Those branches are conditional upon there being list entries. That check should be extremely cheap. The vast majority of cases will have no registered notifiers. The second check for the _end callout will be from cpu cache. > I don't want to waste cycles if not wasting them improves performance > both when armed and disarmed. In summary, I think we have narrowed down the case of no registered notifiers to being infinitesimal. The case of registered notifiers being a distinction without a difference. > > When I was discussing this difference with Jack, he reminded me that > > the GRU, due to its hardware, does not have any race issues with the > > invalidate_page callout simply doing the tlb shootdown and not modifying > > any of its internal structures. He then put a caveat on the discussion > > that _either_ method was acceptable as far as he was concerned. The real > > issue is getting a patch in that satisfies all needs and not whether > > there is a seperate invalidate_page callout. > > Sure, we have that patch now, I'll send it out in a minute, I was just > trying to explain why it makes sense to have an invalidate_page too > (which remains the only difference by now), removing it would be a > regression on all sides, even if a minor one. I think GRU is the only compelling case I have heard for having the invalidate_page seperate. In the case of the GRU, the hardware enforces a lifetime of the invalidate which covers all in-progress faults including ones where the hardware is informed after the flush of a PTE. in all cases, once the GRU invalidate instruction is issued, all active requests are invalidated. Future faults will be blocked in get_user_pages(). Without that special feature of the hardware, I don't think any code simplification exists. I, of course, reserve the right to be wrong. I believe the argument against a seperate invalidate_page() callout was Christoph's interpretation of Andrew's comments. I am not certain Andrew was aware of this special aspects of the GRU hardware and whether that had been factored into the discussion at that point in time. Thanks, Robin ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel