On Wed, 12 Nov 2008, Andrea Arcangeli wrote: > > O_DIRECT does not take a refcount on the page in order to prevent this? > > It definitely does, it's also the only thing it does.
Then page migration will not occur because there is an unresolved reference. > The whole point is that O_DIRECT can start the instruction after > page_count returns as far as I can tell. But there must still be reference for the bio and whatever may be going on at the time in order to perform the I/O operation. > If you check the three emails I linked in answer to Andrew on the > topic, we agree the o_direct can't start under PT lock (or under > mmap_sem in write mode but migrate.c rightefully takes the read > mode). So the fix used in ksm page_wrprotect and in fork() is to check > page_count vs page_mapcount inside PT lock before doing anything on > the pte. If you just mark the page wprotect while O_DIRECT is in > flight, that's enough for fork() to generate data corruption in the > parent (not the child where the result would be undefined). But in the > parent the result of the o-direct is defined and it'd never corrupt if > this was a cached-I/O. The moment the parent pte is marked readonly, a thread > in the parent could write to the last 512bytes of the page, leading to > the first 512bytes coming with O_DIRECT from disk being lost (as the > write will trigger a cow before I/O is complete and the dma will > complete on the oldpage). Have you actually seen corruption or this conjecture? AFACT the page count is elevated while I/O is in progress and thus this is safe. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
