Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-20 Thread Ross Zwisler
On Thu, Apr 20, 2017 at 04:44:31PM +0200, Jan Kara wrote:
> On Thu 20-04-17 16:35:10, Jan Kara wrote:
> > On Wed 19-04-17 13:28:36, Ross Zwisler wrote:
> > > On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> > > > On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > > > > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> > > > >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> > > > >> conditionally iff mapping->nrpages is not zero. If page cache is 
> > > > >> empty,
> > > > >> buffered read following after direct IO write would get stale data 
> > > > >> from
> > > > >> the cleancache.
> > > > >>
> > > > >> Also it doesn't feel right to check only for ->nrpages because
> > > > >> invalidate_inode_pages2[_range] invalidates exceptional entries as 
> > > > >> well.
> > > > >>
> > > > >> Fix this by calling invalidate_inode_pages2[_range]() regardless of 
> > > > >> nrpages
> > > > >> state.
> > > > >>
> > > > >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> > > > >> Signed-off-by: Andrey Ryabinin 
> > > > >> ---
> > > > > <>
> > > > >> diff --git a/fs/dax.c b/fs/dax.c
> > > > >> index 2e382fe..1e8cca0 100644
> > > > >> --- a/fs/dax.c
> > > > >> +++ b/fs/dax.c
> > > > >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t 
> > > > >> pos, loff_t length, void *data,
> > > > >>   * into page tables. We have to tear down these mappings so 
> > > > >> that data
> > > > >>   * written by write(2) is visible in mmap.
> > > > >>   */
> > > > >> -if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> > > > >> +if ((iomap->flags & IOMAP_F_NEW)) {
> > > > >>  invalidate_inode_pages2_range(inode->i_mapping,
> > > > >>pos >> PAGE_SHIFT,
> > > > >>(end - 1) >> PAGE_SHIFT);
> > > > > 
> > > > > tl;dr: I think the old code is correct, and that you don't need this 
> > > > > change.
> > > > > 
> > > > > This should be harmless, but could slow us down a little if we keep
> > > > > calling invalidate_inode_pages2_range() without really needing to.  
> > > > > Really for
> > > > > DAX I think we need to call invalidate_inode_page2_range() only if we 
> > > > > have
> > > > > zero pages mapped over the place where we are doing I/O, which is why 
> > > > > we check
> > > > > nrpages.
> > > > > 
> > > > 
> > > > Check for ->nrpages only looks strange, because 
> > > > invalidate_inode_pages2_range() also
> > > > invalidates exceptional radix tree entries. Is that correct that we 
> > > > invalidate
> > > > exceptional entries only if ->nrpages > 0 and skip invalidation 
> > > > otherwise?
> > > 
> > > For DAX we only invalidate clean DAX exceptional entries so that we can 
> > > keep
> > > dirty entries around for writeback, but yes you're correct that we only 
> > > do the
> > > invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)
> > 
> > Actually in this place the nrpages check is deliberate since there should
> > only be hole pages or nothing in the invalidated range - see the comment
> > before the if. But thinking more about it this assumption actually is not
> > right in presence of zero PMD entries in the radix tree. So this change
> > actually also fixes a possible bug for DAX but we should do it as a
> > separate patch with a proper changelog.
> 
> Something like the attached patch. Ross?

Yep, great catch, this is a real issue.  The attached patch isn't sufficient,
though, because invalidate_inode_pages2_range() for DAX exceptional entries
only wipes out the radix tree entry, and doesn't call unmap_mapping_range() as
it does in the case of real pages.

I'm working on a fix and an associated xfstest test.


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-20 Thread Ross Zwisler
On Thu, Apr 20, 2017 at 04:44:31PM +0200, Jan Kara wrote:
> On Thu 20-04-17 16:35:10, Jan Kara wrote:
> > On Wed 19-04-17 13:28:36, Ross Zwisler wrote:
> > > On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> > > > On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > > > > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> > > > >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> > > > >> conditionally iff mapping->nrpages is not zero. If page cache is 
> > > > >> empty,
> > > > >> buffered read following after direct IO write would get stale data 
> > > > >> from
> > > > >> the cleancache.
> > > > >>
> > > > >> Also it doesn't feel right to check only for ->nrpages because
> > > > >> invalidate_inode_pages2[_range] invalidates exceptional entries as 
> > > > >> well.
> > > > >>
> > > > >> Fix this by calling invalidate_inode_pages2[_range]() regardless of 
> > > > >> nrpages
> > > > >> state.
> > > > >>
> > > > >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> > > > >> Signed-off-by: Andrey Ryabinin 
> > > > >> ---
> > > > > <>
> > > > >> diff --git a/fs/dax.c b/fs/dax.c
> > > > >> index 2e382fe..1e8cca0 100644
> > > > >> --- a/fs/dax.c
> > > > >> +++ b/fs/dax.c
> > > > >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t 
> > > > >> pos, loff_t length, void *data,
> > > > >>   * into page tables. We have to tear down these mappings so 
> > > > >> that data
> > > > >>   * written by write(2) is visible in mmap.
> > > > >>   */
> > > > >> -if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> > > > >> +if ((iomap->flags & IOMAP_F_NEW)) {
> > > > >>  invalidate_inode_pages2_range(inode->i_mapping,
> > > > >>pos >> PAGE_SHIFT,
> > > > >>(end - 1) >> PAGE_SHIFT);
> > > > > 
> > > > > tl;dr: I think the old code is correct, and that you don't need this 
> > > > > change.
> > > > > 
> > > > > This should be harmless, but could slow us down a little if we keep
> > > > > calling invalidate_inode_pages2_range() without really needing to.  
> > > > > Really for
> > > > > DAX I think we need to call invalidate_inode_page2_range() only if we 
> > > > > have
> > > > > zero pages mapped over the place where we are doing I/O, which is why 
> > > > > we check
> > > > > nrpages.
> > > > > 
> > > > 
> > > > Check for ->nrpages only looks strange, because 
> > > > invalidate_inode_pages2_range() also
> > > > invalidates exceptional radix tree entries. Is that correct that we 
> > > > invalidate
> > > > exceptional entries only if ->nrpages > 0 and skip invalidation 
> > > > otherwise?
> > > 
> > > For DAX we only invalidate clean DAX exceptional entries so that we can 
> > > keep
> > > dirty entries around for writeback, but yes you're correct that we only 
> > > do the
> > > invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)
> > 
> > Actually in this place the nrpages check is deliberate since there should
> > only be hole pages or nothing in the invalidated range - see the comment
> > before the if. But thinking more about it this assumption actually is not
> > right in presence of zero PMD entries in the radix tree. So this change
> > actually also fixes a possible bug for DAX but we should do it as a
> > separate patch with a proper changelog.
> 
> Something like the attached patch. Ross?

Yep, great catch, this is a real issue.  The attached patch isn't sufficient,
though, because invalidate_inode_pages2_range() for DAX exceptional entries
only wipes out the radix tree entry, and doesn't call unmap_mapping_range() as
it does in the case of real pages.

I'm working on a fix and an associated xfstest test.


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-20 Thread Jan Kara
On Thu 20-04-17 16:35:10, Jan Kara wrote:
> On Wed 19-04-17 13:28:36, Ross Zwisler wrote:
> > On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> > > On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > > > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> > > >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> > > >> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> > > >> buffered read following after direct IO write would get stale data from
> > > >> the cleancache.
> > > >>
> > > >> Also it doesn't feel right to check only for ->nrpages because
> > > >> invalidate_inode_pages2[_range] invalidates exceptional entries as 
> > > >> well.
> > > >>
> > > >> Fix this by calling invalidate_inode_pages2[_range]() regardless of 
> > > >> nrpages
> > > >> state.
> > > >>
> > > >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> > > >> Signed-off-by: Andrey Ryabinin 
> > > >> ---
> > > > <>
> > > >> diff --git a/fs/dax.c b/fs/dax.c
> > > >> index 2e382fe..1e8cca0 100644
> > > >> --- a/fs/dax.c
> > > >> +++ b/fs/dax.c
> > > >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
> > > >> loff_t length, void *data,
> > > >> * into page tables. We have to tear down these mappings so 
> > > >> that data
> > > >> * written by write(2) is visible in mmap.
> > > >> */
> > > >> -  if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> > > >> +  if ((iomap->flags & IOMAP_F_NEW)) {
> > > >>invalidate_inode_pages2_range(inode->i_mapping,
> > > >>  pos >> PAGE_SHIFT,
> > > >>  (end - 1) >> PAGE_SHIFT);
> > > > 
> > > > tl;dr: I think the old code is correct, and that you don't need this 
> > > > change.
> > > > 
> > > > This should be harmless, but could slow us down a little if we keep
> > > > calling invalidate_inode_pages2_range() without really needing to.  
> > > > Really for
> > > > DAX I think we need to call invalidate_inode_page2_range() only if we 
> > > > have
> > > > zero pages mapped over the place where we are doing I/O, which is why 
> > > > we check
> > > > nrpages.
> > > > 
> > > 
> > > Check for ->nrpages only looks strange, because 
> > > invalidate_inode_pages2_range() also
> > > invalidates exceptional radix tree entries. Is that correct that we 
> > > invalidate
> > > exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?
> > 
> > For DAX we only invalidate clean DAX exceptional entries so that we can keep
> > dirty entries around for writeback, but yes you're correct that we only do 
> > the
> > invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)
> 
> Actually in this place the nrpages check is deliberate since there should
> only be hole pages or nothing in the invalidated range - see the comment
> before the if. But thinking more about it this assumption actually is not
> right in presence of zero PMD entries in the radix tree. So this change
> actually also fixes a possible bug for DAX but we should do it as a
> separate patch with a proper changelog.

Something like the attached patch. Ross?

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From da79b4b72a6fe5fcf1a554ca1ce77cb462e8a306 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Thu, 20 Apr 2017 16:38:20 +0200
Subject: [PATCH] dax: Fix inconsistency between mmap and write(2)

When a process has a PMD-sized hole mapped via mmap and later allocates
part of the file underlying this area using write(2), memory mappings
need not be appropriately invalidated if the file has no hole pages
allocated and thus view via mmap will not show the data written by
write(2). Fix the problem by always invalidating memory mappings
covering part of the file for which blocks got allocated.

Signed-off-by: Jan Kara 
---
 fs/dax.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85abd741253d..da7bc44e5725 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1028,11 +1028,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 	  pos >> PAGE_SHIFT,
 	  (end - 1) >> PAGE_SHIFT);
-- 
2.12.0



Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-20 Thread Jan Kara
On Thu 20-04-17 16:35:10, Jan Kara wrote:
> On Wed 19-04-17 13:28:36, Ross Zwisler wrote:
> > On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> > > On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > > > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> > > >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> > > >> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> > > >> buffered read following after direct IO write would get stale data from
> > > >> the cleancache.
> > > >>
> > > >> Also it doesn't feel right to check only for ->nrpages because
> > > >> invalidate_inode_pages2[_range] invalidates exceptional entries as 
> > > >> well.
> > > >>
> > > >> Fix this by calling invalidate_inode_pages2[_range]() regardless of 
> > > >> nrpages
> > > >> state.
> > > >>
> > > >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> > > >> Signed-off-by: Andrey Ryabinin 
> > > >> ---
> > > > <>
> > > >> diff --git a/fs/dax.c b/fs/dax.c
> > > >> index 2e382fe..1e8cca0 100644
> > > >> --- a/fs/dax.c
> > > >> +++ b/fs/dax.c
> > > >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
> > > >> loff_t length, void *data,
> > > >> * into page tables. We have to tear down these mappings so 
> > > >> that data
> > > >> * written by write(2) is visible in mmap.
> > > >> */
> > > >> -  if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> > > >> +  if ((iomap->flags & IOMAP_F_NEW)) {
> > > >>invalidate_inode_pages2_range(inode->i_mapping,
> > > >>  pos >> PAGE_SHIFT,
> > > >>  (end - 1) >> PAGE_SHIFT);
> > > > 
> > > > tl;dr: I think the old code is correct, and that you don't need this 
> > > > change.
> > > > 
> > > > This should be harmless, but could slow us down a little if we keep
> > > > calling invalidate_inode_pages2_range() without really needing to.  
> > > > Really for
> > > > DAX I think we need to call invalidate_inode_page2_range() only if we 
> > > > have
> > > > zero pages mapped over the place where we are doing I/O, which is why 
> > > > we check
> > > > nrpages.
> > > > 
> > > 
> > > Check for ->nrpages only looks strange, because 
> > > invalidate_inode_pages2_range() also
> > > invalidates exceptional radix tree entries. Is that correct that we 
> > > invalidate
> > > exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?
> > 
> > For DAX we only invalidate clean DAX exceptional entries so that we can keep
> > dirty entries around for writeback, but yes you're correct that we only do 
> > the
> > invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)
> 
> Actually in this place the nrpages check is deliberate since there should
> only be hole pages or nothing in the invalidated range - see the comment
> before the if. But thinking more about it this assumption actually is not
> right in presence of zero PMD entries in the radix tree. So this change
> actually also fixes a possible bug for DAX but we should do it as a
> separate patch with a proper changelog.

Something like the attached patch. Ross?

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From da79b4b72a6fe5fcf1a554ca1ce77cb462e8a306 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Thu, 20 Apr 2017 16:38:20 +0200
Subject: [PATCH] dax: Fix inconsistency between mmap and write(2)

When a process has a PMD-sized hole mapped via mmap and later allocates
part of the file underlying this area using write(2), memory mappings
need not be appropriately invalidated if the file has no hole pages
allocated and thus view via mmap will not show the data written by
write(2). Fix the problem by always invalidating memory mappings
covering part of the file for which blocks got allocated.

Signed-off-by: Jan Kara 
---
 fs/dax.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 85abd741253d..da7bc44e5725 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1028,11 +1028,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 	  pos >> PAGE_SHIFT,
 	  (end - 1) >> PAGE_SHIFT);
-- 
2.12.0



Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-20 Thread Jan Kara
On Wed 19-04-17 13:28:36, Ross Zwisler wrote:
> On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> > On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> > >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> > >> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> > >> buffered read following after direct IO write would get stale data from
> > >> the cleancache.
> > >>
> > >> Also it doesn't feel right to check only for ->nrpages because
> > >> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> > >>
> > >> Fix this by calling invalidate_inode_pages2[_range]() regardless of 
> > >> nrpages
> > >> state.
> > >>
> > >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> > >> Signed-off-by: Andrey Ryabinin 
> > >> ---
> > > <>
> > >> diff --git a/fs/dax.c b/fs/dax.c
> > >> index 2e382fe..1e8cca0 100644
> > >> --- a/fs/dax.c
> > >> +++ b/fs/dax.c
> > >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
> > >> loff_t length, void *data,
> > >>   * into page tables. We have to tear down these mappings so 
> > >> that data
> > >>   * written by write(2) is visible in mmap.
> > >>   */
> > >> -if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> > >> +if ((iomap->flags & IOMAP_F_NEW)) {
> > >>  invalidate_inode_pages2_range(inode->i_mapping,
> > >>pos >> PAGE_SHIFT,
> > >>(end - 1) >> PAGE_SHIFT);
> > > 
> > > tl;dr: I think the old code is correct, and that you don't need this 
> > > change.
> > > 
> > > This should be harmless, but could slow us down a little if we keep
> > > calling invalidate_inode_pages2_range() without really needing to.  
> > > Really for
> > > DAX I think we need to call invalidate_inode_page2_range() only if we have
> > > zero pages mapped over the place where we are doing I/O, which is why we 
> > > check
> > > nrpages.
> > > 
> > 
> > Check for ->nrpages only looks strange, because 
> > invalidate_inode_pages2_range() also
> > invalidates exceptional radix tree entries. Is that correct that we 
> > invalidate
> > exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?
> 
> For DAX we only invalidate clean DAX exceptional entries so that we can keep
> dirty entries around for writeback, but yes you're correct that we only do the
> invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)

Actually in this place the nrpages check is deliberate since there should
only be hole pages or nothing in the invalidated range - see the comment
before the if. But thinking more about it this assumption actually is not
right in presence of zero PMD entries in the radix tree. So this change
actually also fixes a possible bug for DAX but we should do it as a
separate patch with a proper changelog.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-20 Thread Jan Kara
On Wed 19-04-17 13:28:36, Ross Zwisler wrote:
> On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> > On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> > >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> > >> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> > >> buffered read following after direct IO write would get stale data from
> > >> the cleancache.
> > >>
> > >> Also it doesn't feel right to check only for ->nrpages because
> > >> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> > >>
> > >> Fix this by calling invalidate_inode_pages2[_range]() regardless of 
> > >> nrpages
> > >> state.
> > >>
> > >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> > >> Signed-off-by: Andrey Ryabinin 
> > >> ---
> > > <>
> > >> diff --git a/fs/dax.c b/fs/dax.c
> > >> index 2e382fe..1e8cca0 100644
> > >> --- a/fs/dax.c
> > >> +++ b/fs/dax.c
> > >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
> > >> loff_t length, void *data,
> > >>   * into page tables. We have to tear down these mappings so 
> > >> that data
> > >>   * written by write(2) is visible in mmap.
> > >>   */
> > >> -if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> > >> +if ((iomap->flags & IOMAP_F_NEW)) {
> > >>  invalidate_inode_pages2_range(inode->i_mapping,
> > >>pos >> PAGE_SHIFT,
> > >>(end - 1) >> PAGE_SHIFT);
> > > 
> > > tl;dr: I think the old code is correct, and that you don't need this 
> > > change.
> > > 
> > > This should be harmless, but could slow us down a little if we keep
> > > calling invalidate_inode_pages2_range() without really needing to.  
> > > Really for
> > > DAX I think we need to call invalidate_inode_page2_range() only if we have
> > > zero pages mapped over the place where we are doing I/O, which is why we 
> > > check
> > > nrpages.
> > > 
> > 
> > Check for ->nrpages only looks strange, because 
> > invalidate_inode_pages2_range() also
> > invalidates exceptional radix tree entries. Is that correct that we 
> > invalidate
> > exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?
> 
> For DAX we only invalidate clean DAX exceptional entries so that we can keep
> dirty entries around for writeback, but yes you're correct that we only do the
> invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)

Actually in this place the nrpages check is deliberate since there should
only be hole pages or nothing in the invalidated range - see the comment
before the if. But thinking more about it this assumption actually is not
right in presence of zero PMD entries in the radix tree. So this change
actually also fixes a possible bug for DAX but we should do it as a
separate patch with a proper changelog.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-19 Thread Ross Zwisler
On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> >> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> >> buffered read following after direct IO write would get stale data from
> >> the cleancache.
> >>
> >> Also it doesn't feel right to check only for ->nrpages because
> >> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> >>
> >> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
> >> state.
> >>
> >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> >> Signed-off-by: Andrey Ryabinin 
> >> ---
> > <>
> >> diff --git a/fs/dax.c b/fs/dax.c
> >> index 2e382fe..1e8cca0 100644
> >> --- a/fs/dax.c
> >> +++ b/fs/dax.c
> >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
> >> loff_t length, void *data,
> >> * into page tables. We have to tear down these mappings so that data
> >> * written by write(2) is visible in mmap.
> >> */
> >> -  if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> >> +  if ((iomap->flags & IOMAP_F_NEW)) {
> >>invalidate_inode_pages2_range(inode->i_mapping,
> >>  pos >> PAGE_SHIFT,
> >>  (end - 1) >> PAGE_SHIFT);
> > 
> > tl;dr: I think the old code is correct, and that you don't need this change.
> > 
> > This should be harmless, but could slow us down a little if we keep
> > calling invalidate_inode_pages2_range() without really needing to.  Really 
> > for
> > DAX I think we need to call invalidate_inode_page2_range() only if we have
> > zero pages mapped over the place where we are doing I/O, which is why we 
> > check
> > nrpages.
> > 
> 
> Check for ->nrpages only looks strange, because 
> invalidate_inode_pages2_range() also
> invalidates exceptional radix tree entries. Is that correct that we invalidate
> exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?

For DAX we only invalidate clean DAX exceptional entries so that we can keep
dirty entries around for writeback, but yes you're correct that we only do the
invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-19 Thread Ross Zwisler
On Wed, Apr 19, 2017 at 06:11:31PM +0300, Andrey Ryabinin wrote:
> On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> > On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> >> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> >> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> >> buffered read following after direct IO write would get stale data from
> >> the cleancache.
> >>
> >> Also it doesn't feel right to check only for ->nrpages because
> >> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> >>
> >> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
> >> state.
> >>
> >> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> >> Signed-off-by: Andrey Ryabinin 
> >> ---
> > <>
> >> diff --git a/fs/dax.c b/fs/dax.c
> >> index 2e382fe..1e8cca0 100644
> >> --- a/fs/dax.c
> >> +++ b/fs/dax.c
> >> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
> >> loff_t length, void *data,
> >> * into page tables. We have to tear down these mappings so that data
> >> * written by write(2) is visible in mmap.
> >> */
> >> -  if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> >> +  if ((iomap->flags & IOMAP_F_NEW)) {
> >>invalidate_inode_pages2_range(inode->i_mapping,
> >>  pos >> PAGE_SHIFT,
> >>  (end - 1) >> PAGE_SHIFT);
> > 
> > tl;dr: I think the old code is correct, and that you don't need this change.
> > 
> > This should be harmless, but could slow us down a little if we keep
> > calling invalidate_inode_pages2_range() without really needing to.  Really 
> > for
> > DAX I think we need to call invalidate_inode_page2_range() only if we have
> > zero pages mapped over the place where we are doing I/O, which is why we 
> > check
> > nrpages.
> > 
> 
> Check for ->nrpages only looks strange, because 
> invalidate_inode_pages2_range() also
> invalidates exceptional radix tree entries. Is that correct that we invalidate
> exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?

For DAX we only invalidate clean DAX exceptional entries so that we can keep
dirty entries around for writeback, but yes you're correct that we only do the
invalidation if nrpages > 0.  And yes, it does seem a bit weird. :)


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-19 Thread Andrey Ryabinin


On 04/19/2017 01:46 AM, Andrew Morton wrote:
> On Fri, 14 Apr 2017 17:07:50 +0300 Andrey Ryabinin  
> wrote:
> 
>> Some direct write fs hooks call invalidate_inode_pages2[_range]()
>> conditionally iff mapping->nrpages is not zero. If page cache is empty,
>> buffered read following after direct IO write would get stale data from
>> the cleancache.
>>
>> Also it doesn't feel right to check only for ->nrpages because
>> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
>>
>> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
>> state.
> 
> I'm not understanding this.  I can buy the argument about
> nrexceptional, but why does cleancache require the
> invalidate_inode_pages2_range) call even when ->nrpages is zero?
> 
> I *assume* it's because invalidate_inode_pages2_range() calls
> cleancache_invalidate_inode(), yes?  If so, can we please add this to
> the changelog?  If not then please explain further.
> 

Yes, your assumption is correct. I'll fix the changelog.


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-19 Thread Andrey Ryabinin


On 04/19/2017 01:46 AM, Andrew Morton wrote:
> On Fri, 14 Apr 2017 17:07:50 +0300 Andrey Ryabinin  
> wrote:
> 
>> Some direct write fs hooks call invalidate_inode_pages2[_range]()
>> conditionally iff mapping->nrpages is not zero. If page cache is empty,
>> buffered read following after direct IO write would get stale data from
>> the cleancache.
>>
>> Also it doesn't feel right to check only for ->nrpages because
>> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
>>
>> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
>> state.
> 
> I'm not understanding this.  I can buy the argument about
> nrexceptional, but why does cleancache require the
> invalidate_inode_pages2_range) call even when ->nrpages is zero?
> 
> I *assume* it's because invalidate_inode_pages2_range() calls
> cleancache_invalidate_inode(), yes?  If so, can we please add this to
> the changelog?  If not then please explain further.
> 

Yes, your assumption is correct. I'll fix the changelog.


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-19 Thread Andrey Ryabinin
On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
>> Some direct write fs hooks call invalidate_inode_pages2[_range]()
>> conditionally iff mapping->nrpages is not zero. If page cache is empty,
>> buffered read following after direct IO write would get stale data from
>> the cleancache.
>>
>> Also it doesn't feel right to check only for ->nrpages because
>> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
>>
>> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
>> state.
>>
>> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
>> Signed-off-by: Andrey Ryabinin 
>> ---
> <>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index 2e382fe..1e8cca0 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
>> loff_t length, void *data,
>>   * into page tables. We have to tear down these mappings so that data
>>   * written by write(2) is visible in mmap.
>>   */
>> -if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
>> +if ((iomap->flags & IOMAP_F_NEW)) {
>>  invalidate_inode_pages2_range(inode->i_mapping,
>>pos >> PAGE_SHIFT,
>>(end - 1) >> PAGE_SHIFT);
> 
> tl;dr: I think the old code is correct, and that you don't need this change.
> 
> This should be harmless, but could slow us down a little if we keep
> calling invalidate_inode_pages2_range() without really needing to.  Really for
> DAX I think we need to call invalidate_inode_page2_range() only if we have
> zero pages mapped over the place where we are doing I/O, which is why we check
> nrpages.
> 

Check for ->nrpages only looks strange, because invalidate_inode_pages2_range() 
also
invalidates exceptional radix tree entries. Is that correct that we invalidate
exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?


> Is DAX even allowed to be used at the same time as cleancache?  From a brief
> look at Documentation/vm/cleancache.txt, it seems like these two features are
> incompatible.  With DAX we already are avoiding the page cache completely.

tl;dr: I think you're right.

cleancache may store any PageUptodate && PageMappedToDisk page evicted from 
page cache (see __delete_from_page_cache)
DAX deletes hole page via __delete_from_page_cache(), but I don't see we mark 
such page as Uptodate or MappedToDisk
so it will never go into the cleancache.

Latter cleancache_get_page() (e.g. it's called from mpage_readpages() which is 
called from blkdev_read_pages())
I assume that DAX doesn't use a_ops->readpages() method so 
cleancache_get_page() is never called from DAX.


> Anyway, I don't see how this change in DAX can save us from a data corruption
> (which is what you're seeing, right?), and I think it could slow us down, so
> I'd prefer to leave things as they are.
> 

I'll remove this hunk from v2.

Thanks.





Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-19 Thread Andrey Ryabinin
On 04/18/2017 10:38 PM, Ross Zwisler wrote:
> On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
>> Some direct write fs hooks call invalidate_inode_pages2[_range]()
>> conditionally iff mapping->nrpages is not zero. If page cache is empty,
>> buffered read following after direct IO write would get stale data from
>> the cleancache.
>>
>> Also it doesn't feel right to check only for ->nrpages because
>> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
>>
>> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
>> state.
>>
>> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
>> Signed-off-by: Andrey Ryabinin 
>> ---
> <>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index 2e382fe..1e8cca0 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, 
>> loff_t length, void *data,
>>   * into page tables. We have to tear down these mappings so that data
>>   * written by write(2) is visible in mmap.
>>   */
>> -if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
>> +if ((iomap->flags & IOMAP_F_NEW)) {
>>  invalidate_inode_pages2_range(inode->i_mapping,
>>pos >> PAGE_SHIFT,
>>(end - 1) >> PAGE_SHIFT);
> 
> tl;dr: I think the old code is correct, and that you don't need this change.
> 
> This should be harmless, but could slow us down a little if we keep
> calling invalidate_inode_pages2_range() without really needing to.  Really for
> DAX I think we need to call invalidate_inode_page2_range() only if we have
> zero pages mapped over the place where we are doing I/O, which is why we check
> nrpages.
> 

Check for ->nrpages only looks strange, because invalidate_inode_pages2_range() 
also
invalidates exceptional radix tree entries. Is that correct that we invalidate
exceptional entries only if ->nrpages > 0 and skip invalidation otherwise?


> Is DAX even allowed to be used at the same time as cleancache?  From a brief
> look at Documentation/vm/cleancache.txt, it seems like these two features are
> incompatible.  With DAX we already are avoiding the page cache completely.

tl;dr: I think you're right.

cleancache may store any PageUptodate && PageMappedToDisk page evicted from 
page cache (see __delete_from_page_cache)
DAX deletes hole page via __delete_from_page_cache(), but I don't see we mark 
such page as Uptodate or MappedToDisk
so it will never go into the cleancache.

Latter cleancache_get_page() (e.g. it's called from mpage_readpages() which is 
called from blkdev_read_pages())
I assume that DAX doesn't use a_ops->readpages() method so 
cleancache_get_page() is never called from DAX.


> Anyway, I don't see how this change in DAX can save us from a data corruption
> (which is what you're seeing, right?), and I think it could slow us down, so
> I'd prefer to leave things as they are.
> 

I'll remove this hunk from v2.

Thanks.





Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-18 Thread Andrew Morton
On Fri, 14 Apr 2017 17:07:50 +0300 Andrey Ryabinin  
wrote:

> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> buffered read following after direct IO write would get stale data from
> the cleancache.
> 
> Also it doesn't feel right to check only for ->nrpages because
> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> 
> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
> state.

I'm not understanding this.  I can buy the argument about
nrexceptional, but why does cleancache require the
invalidate_inode_pages2_range) call even when ->nrpages is zero?

I *assume* it's because invalidate_inode_pages2_range() calls
cleancache_invalidate_inode(), yes?  If so, can we please add this to
the changelog?  If not then please explain further.





Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-18 Thread Andrew Morton
On Fri, 14 Apr 2017 17:07:50 +0300 Andrey Ryabinin  
wrote:

> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> buffered read following after direct IO write would get stale data from
> the cleancache.
> 
> Also it doesn't feel right to check only for ->nrpages because
> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> 
> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
> state.

I'm not understanding this.  I can buy the argument about
nrexceptional, but why does cleancache require the
invalidate_inode_pages2_range) call even when ->nrpages is zero?

I *assume* it's because invalidate_inode_pages2_range() calls
cleancache_invalidate_inode(), yes?  If so, can we please add this to
the changelog?  If not then please explain further.





Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-18 Thread Ross Zwisler
On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> buffered read following after direct IO write would get stale data from
> the cleancache.
> 
> Also it doesn't feel right to check only for ->nrpages because
> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> 
> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
> state.
> 
> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> Signed-off-by: Andrey Ryabinin 
> ---
<>
> diff --git a/fs/dax.c b/fs/dax.c
> index 2e382fe..1e8cca0 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t 
> length, void *data,
>* into page tables. We have to tear down these mappings so that data
>* written by write(2) is visible in mmap.
>*/
> - if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> + if ((iomap->flags & IOMAP_F_NEW)) {
>   invalidate_inode_pages2_range(inode->i_mapping,
> pos >> PAGE_SHIFT,
> (end - 1) >> PAGE_SHIFT);

tl;dr: I think the old code is correct, and that you don't need this change.

This should be harmless, but could slow us down a little if we keep
calling invalidate_inode_pages2_range() without really needing to.  Really for
DAX I think we need to call invalidate_inode_page2_range() only if we have
zero pages mapped over the place where we are doing I/O, which is why we check
nrpages.

Is DAX even allowed to be used at the same time as cleancache?  From a brief
look at Documentation/vm/cleancache.txt, it seems like these two features are
incompatible.  With DAX we already are avoiding the page cache completely.

Anyway, I don't see how this change in DAX can save us from a data corruption
(which is what you're seeing, right?), and I think it could slow us down, so
I'd prefer to leave things as they are.


Re: [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-18 Thread Ross Zwisler
On Fri, Apr 14, 2017 at 05:07:50PM +0300, Andrey Ryabinin wrote:
> Some direct write fs hooks call invalidate_inode_pages2[_range]()
> conditionally iff mapping->nrpages is not zero. If page cache is empty,
> buffered read following after direct IO write would get stale data from
> the cleancache.
> 
> Also it doesn't feel right to check only for ->nrpages because
> invalidate_inode_pages2[_range] invalidates exceptional entries as well.
> 
> Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
> state.
> 
> Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
> Signed-off-by: Andrey Ryabinin 
> ---
<>
> diff --git a/fs/dax.c b/fs/dax.c
> index 2e382fe..1e8cca0 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t 
> length, void *data,
>* into page tables. We have to tear down these mappings so that data
>* written by write(2) is visible in mmap.
>*/
> - if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
> + if ((iomap->flags & IOMAP_F_NEW)) {
>   invalidate_inode_pages2_range(inode->i_mapping,
> pos >> PAGE_SHIFT,
> (end - 1) >> PAGE_SHIFT);

tl;dr: I think the old code is correct, and that you don't need this change.

This should be harmless, but could slow us down a little if we keep
calling invalidate_inode_pages2_range() without really needing to.  Really for
DAX I think we need to call invalidate_inode_page2_range() only if we have
zero pages mapped over the place where we are doing I/O, which is why we check
nrpages.

Is DAX even allowed to be used at the same time as cleancache?  From a brief
look at Documentation/vm/cleancache.txt, it seems like these two features are
incompatible.  With DAX we already are avoiding the page cache completely.

Anyway, I don't see how this change in DAX can save us from a data corruption
(which is what you're seeing, right?), and I think it could slow us down, so
I'd prefer to leave things as they are.


[PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-14 Thread Andrey Ryabinin
Some direct write fs hooks call invalidate_inode_pages2[_range]()
conditionally iff mapping->nrpages is not zero. If page cache is empty,
buffered read following after direct IO write would get stale data from
the cleancache.

Also it doesn't feel right to check only for ->nrpages because
invalidate_inode_pages2[_range] invalidates exceptional entries as well.

Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
state.

Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
Signed-off-by: Andrey Ryabinin 
---
 fs/9p/vfs_file.c |  2 +-
 fs/cifs/inode.c  |  2 +-
 fs/dax.c |  2 +-
 fs/iomap.c   | 16 +++-
 fs/nfs/direct.c  |  6 ++
 fs/nfs/inode.c   |  8 +---
 mm/filemap.c | 26 +++---
 7 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 3de3b4a8..786d0de 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -423,7 +423,7 @@ v9fs_file_write_iter(struct kiocb *iocb, struct iov_iter 
*from)
unsigned long pg_start, pg_end;
pg_start = origin >> PAGE_SHIFT;
pg_end = (origin + retval - 1) >> PAGE_SHIFT;
-   if (inode->i_mapping && inode->i_mapping->nrpages)
+   if (inode->i_mapping)
invalidate_inode_pages2_range(inode->i_mapping,
  pg_start, pg_end);
iocb->ki_pos += retval;
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index c3b2fa0..6539fa3 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1857,7 +1857,7 @@ cifs_invalidate_mapping(struct inode *inode)
 {
int rc = 0;
 
-   if (inode->i_mapping && inode->i_mapping->nrpages != 0) {
+   if (inode->i_mapping) {
rc = invalidate_inode_pages2(inode->i_mapping);
if (rc)
cifs_dbg(VFS, "%s: could not invalidate inode %p\n",
diff --git a/fs/dax.c b/fs/dax.c
index 2e382fe..1e8cca0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
 * into page tables. We have to tear down these mappings so that data
 * written by write(2) is visible in mmap.
 */
-   if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+   if ((iomap->flags & IOMAP_F_NEW)) {
invalidate_inode_pages2_range(inode->i_mapping,
  pos >> PAGE_SHIFT,
  (end - 1) >> PAGE_SHIFT);
diff --git a/fs/iomap.c b/fs/iomap.c
index 0b457ff..7e1f947 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -880,16 +880,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
flags |= IOMAP_WRITE;
}
 
-   if (mapping->nrpages) {
-   ret = filemap_write_and_wait_range(mapping, start, end);
-   if (ret)
-   goto out_free_dio;
+   ret = filemap_write_and_wait_range(mapping, start, end);
+   if (ret)
+   goto out_free_dio;
 
-   ret = invalidate_inode_pages2_range(mapping,
+   ret = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
-   WARN_ON_ONCE(ret);
-   ret = 0;
-   }
+   WARN_ON_ONCE(ret);
+   ret = 0;
 
inode_dio_begin(inode);
 
@@ -944,7 +942,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 * one is a pretty crazy thing to do, so we don't support it 100%.  If
 * this invalidation fails, tough, the write still worked...
 */
-   if (iov_iter_rw(iter) == WRITE && mapping->nrpages) {
+   if (iov_iter_rw(iter) == WRITE) {
int err = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
WARN_ON_ONCE(err);
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index aab32fc..183ab4d 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -1024,10 +1024,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct 
iov_iter *iter)
 
result = nfs_direct_write_schedule_iovec(dreq, iter, pos);
 
-   if (mapping->nrpages) {
-   invalidate_inode_pages2_range(mapping,
- pos >> PAGE_SHIFT, end);
-   }
+   invalidate_inode_pages2_range(mapping,
+   pos >> PAGE_SHIFT, end);
 
nfs_end_io_direct(inode);
 
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f489a5a..b727ec8 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1118,10 +1118,12 @@ static int nfs_invalidate_mapping(struct inode *inode, 
struct address_space *map
if (ret < 0)
return ret;
}
-   ret = invalidate_inode_pages2(mapping);
-

[PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO

2017-04-14 Thread Andrey Ryabinin
Some direct write fs hooks call invalidate_inode_pages2[_range]()
conditionally iff mapping->nrpages is not zero. If page cache is empty,
buffered read following after direct IO write would get stale data from
the cleancache.

Also it doesn't feel right to check only for ->nrpages because
invalidate_inode_pages2[_range] invalidates exceptional entries as well.

Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages
state.

Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
Signed-off-by: Andrey Ryabinin 
---
 fs/9p/vfs_file.c |  2 +-
 fs/cifs/inode.c  |  2 +-
 fs/dax.c |  2 +-
 fs/iomap.c   | 16 +++-
 fs/nfs/direct.c  |  6 ++
 fs/nfs/inode.c   |  8 +---
 mm/filemap.c | 26 +++---
 7 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 3de3b4a8..786d0de 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -423,7 +423,7 @@ v9fs_file_write_iter(struct kiocb *iocb, struct iov_iter 
*from)
unsigned long pg_start, pg_end;
pg_start = origin >> PAGE_SHIFT;
pg_end = (origin + retval - 1) >> PAGE_SHIFT;
-   if (inode->i_mapping && inode->i_mapping->nrpages)
+   if (inode->i_mapping)
invalidate_inode_pages2_range(inode->i_mapping,
  pg_start, pg_end);
iocb->ki_pos += retval;
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index c3b2fa0..6539fa3 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1857,7 +1857,7 @@ cifs_invalidate_mapping(struct inode *inode)
 {
int rc = 0;
 
-   if (inode->i_mapping && inode->i_mapping->nrpages != 0) {
+   if (inode->i_mapping) {
rc = invalidate_inode_pages2(inode->i_mapping);
if (rc)
cifs_dbg(VFS, "%s: could not invalidate inode %p\n",
diff --git a/fs/dax.c b/fs/dax.c
index 2e382fe..1e8cca0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1047,7 +1047,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
 * into page tables. We have to tear down these mappings so that data
 * written by write(2) is visible in mmap.
 */
-   if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+   if ((iomap->flags & IOMAP_F_NEW)) {
invalidate_inode_pages2_range(inode->i_mapping,
  pos >> PAGE_SHIFT,
  (end - 1) >> PAGE_SHIFT);
diff --git a/fs/iomap.c b/fs/iomap.c
index 0b457ff..7e1f947 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -880,16 +880,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
flags |= IOMAP_WRITE;
}
 
-   if (mapping->nrpages) {
-   ret = filemap_write_and_wait_range(mapping, start, end);
-   if (ret)
-   goto out_free_dio;
+   ret = filemap_write_and_wait_range(mapping, start, end);
+   if (ret)
+   goto out_free_dio;
 
-   ret = invalidate_inode_pages2_range(mapping,
+   ret = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
-   WARN_ON_ONCE(ret);
-   ret = 0;
-   }
+   WARN_ON_ONCE(ret);
+   ret = 0;
 
inode_dio_begin(inode);
 
@@ -944,7 +942,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 * one is a pretty crazy thing to do, so we don't support it 100%.  If
 * this invalidation fails, tough, the write still worked...
 */
-   if (iov_iter_rw(iter) == WRITE && mapping->nrpages) {
+   if (iov_iter_rw(iter) == WRITE) {
int err = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
WARN_ON_ONCE(err);
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index aab32fc..183ab4d 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -1024,10 +1024,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct 
iov_iter *iter)
 
result = nfs_direct_write_schedule_iovec(dreq, iter, pos);
 
-   if (mapping->nrpages) {
-   invalidate_inode_pages2_range(mapping,
- pos >> PAGE_SHIFT, end);
-   }
+   invalidate_inode_pages2_range(mapping,
+   pos >> PAGE_SHIFT, end);
 
nfs_end_io_direct(inode);
 
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f489a5a..b727ec8 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1118,10 +1118,12 @@ static int nfs_invalidate_mapping(struct inode *inode, 
struct address_space *map
if (ret < 0)
return ret;
}
-   ret = invalidate_inode_pages2(mapping);
-   if (ret < 0)
-