On Mon, Oct 03, 2016 at 01:13:58PM +0200, Jan Kara wrote:
> On Mon 03-10-16 02:32:48, Christoph Hellwig wrote:
> > On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> > > Yeah, so DAX path is special because it installs its own PTE directly from
> > > the fault handler which we don't do in any other case (only driver fault
> > > handlers commonly do this but those generally don't care about
> > > ->page_mkwrite or file mappings for that matter).
> > > 
> > > I don't say there are no simplifications or unifications possible, but I'd
> > > prefer to leave them for a bit later once the current churn with ongoing
> > > work somewhat settles...
> > 
> > Allright, let's keep it simple for now.  Being said this series clearly
> > is 4.9 material, but any chance to get a respin of the invalidate_pages
> Agreed (actually 4.10).
> > series as that might still be 4.8 material?
> The problem with invalidate_pages series is that it depends on the ability
> to clear the dirty bits in the radix tree of DAX mappings (i.e. the first
> series). Otherwise radix tree entries that get once dirty can never be safely
> evicted, invalidate_inode_pages2_range() will keep returning EBUSY and
> callers get confused (I've tried that few weeks ago).
> If I dropped patch 5/6 for 4.9 merge (i.e., we would still happily discard
> dirty radix tree entries from invalidate_inode_pages2_range()), things
> would run fine, just fsync() may miss to flush caches for some pages. I'm
> not sure that's much better than current status quo though. Thoughts?

I'm not sure if I'm understanding this correctly, but if you're saying that we
might end up in a case where fsync()/msync() would fail to properly flush
pages that are/should be dirty, I think this is a no-go.  That could result in
data corruption if a user calls fsync(), thinks they've achieved a
synchronization point (updating other metadata or whatever), then via power
loss they lose data they had flushed via that previous fsync() because it was
still in the CPU cache and never really made it out to media.
Linux-nvdimm mailing list

Reply via email to