On 09/22/2016 08:06 PM, Qu Wenruo wrote: > > > At 09/23/2016 02:47 AM, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues <rgold...@suse.com> >> >> While free'ing qgroup->reserved resources, we must check >> if the page is already commmitted to disk or still in memory. >> If not, the reserve free is doubly accounted, once while >> invalidating the page, and the next time while free'ing >> delalloc. This results is qgroup->reserved(u64) going subzero, >> thus very large value. So, no further I/O can be performed. >> >> This is also expressed in the comments, but not performed. >> >> Testcase: >> SCRATCH_DEV=/dev/vdb >> SCRATCH_MNT=/mnt >> mkfs.btrfs -f $SCRATCH_DEV >> mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT >> cd $SCRATCH_MNT >> btrfs quota enable $SCRATCH_MNT >> btrfs subvolume create a >> btrfs qgroup limit 50m a $SCRATCH_MNT >> sync >> for c in {1..15}; do >> dd if=/dev/zero bs=1M count=40 of=$SCRATCH_MNT/a/file; >> done >> >> sleep 10 >> sync >> sleep 5 >> >> touch $SCRATCH_MNT/a/newfile >> >> echo "Removing file" >> rm $SCRATCH_MNT/a/file >> >> Fixes: b9d0b38928 ("btrfs: Add handler for invalidate page") >> Signed-off-by: Goldwyn Rodrigues <rgold...@suse.com> >> --- >> fs/btrfs/inode.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index e6811c4..2e2a026 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -8917,7 +8917,8 @@ again: >> * 2) Not written to disk >> * This means the reserved space should be freed here. >> */ >> - btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE); >> + if (PageDirty(page)) >> + btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE); >> if (!inode_evicting) { >> clear_extent_bit(tree, page_start, page_end, >> EXTENT_LOCKED | EXTENT_DIRTY | >> > Thanks for the test case. > > However for the fix, I'm afraid it may not be the root cause. > > Here, if the pages are dirty, then corresponding range is marked > EXTENT_QGROUP_RESERVED. > Then btrfs_qgroup_free_data() will clear that bit and reduce the number. > > If the pages are already committed, then corresponding range won't be > marked EXTENT_QGROUP_RESERVED. > Later btrfs_qgroup_free_data() won't reduce any bytes, since it will > only reduce the bytes if it cleared EXTENT_QGROUP_RESERVED bit. > > If everything goes well there is no need to check PageDirty() here, as > we have EXTENT_QGROUP_RESERVED bit for that accounting. > > So there is some other thing causing EXTENT_QGROUP_RESERVED bit out of > sync with dirty pages. > Considering you did it 15 times to reproduce the problem, maybe there is > some race causing the problem? >
You can have pages marked as not dirty with EXTENT_QGROUP_RESERVED set for a truncate operation. Performing dd on the same file, truncates the file before overwriting, while the pages of the previous writes are still in memory and not committed to disk. truncate_inode_page() -> truncate_complete_page() clears the dirty flag. So, you can have a case where the EXTENT_QGROUP_RESERVED bit is set while the page is not listed as dirty because the truncate "cleared" all the dirty pages. -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html