Hi, Filipe Manana > -----Original Message----- > From: Filipe Manana [mailto:fdman...@gmail.com] > Sent: Monday, November 16, 2015 6:57 PM > To: Zhao Lei <zhao...@cn.fujitsu.com> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs: Continue replace when set_block_ro failed > > On Mon, Nov 16, 2015 at 10:44 AM, Zhao Lei <zhao...@cn.fujitsu.com> wrote: > > Hi, Filipe Manana > > > >> -----Original Message----- > >> From: linux-btrfs-ow...@vger.kernel.org > >> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Filipe Manana > >> Sent: Monday, November 16, 2015 6:27 PM > >> To: Zhao Lei <zhao...@cn.fujitsu.com> > >> Cc: linux-btrfs@vger.kernel.org > >> Subject: Re: [PATCH] btrfs: Continue replace when set_block_ro failed > >> > >> On Mon, Nov 16, 2015 at 10:07 AM, Zhao Lei <zhao...@cn.fujitsu.com> > wrote: > >> > Hi, Filipe Manana > >> > > >> > Thanks for review. > >> > > >> >> -----Original Message----- > >> >> From: Filipe Manana [mailto:fdman...@gmail.com] > >> >> Sent: Friday, November 13, 2015 8:02 PM > >> >> To: Zhao Lei <zhao...@cn.fujitsu.com> > >> >> Cc: linux-btrfs@vger.kernel.org > >> >> Subject: Re: [PATCH] btrfs: Continue replace when set_block_ro > >> >> failed > >> >> > >> >> On Fri, Nov 13, 2015 at 11:38 AM, Zhao Lei <zhao...@cn.fujitsu.com> > wrote: > >> >> > xfstests/011 failed in node with small_size filesystem. > >> >> > Can be reproduced by following script: > >> >> > DEV_LIST="/dev/vdd /dev/vde" > >> >> > DEV_REPLACE="/dev/vdf" > >> >> > > >> >> > do_test() > >> >> > { > >> >> > local mkfs_opt="$1" > >> >> > local size="$2" > >> >> > > >> >> > dmesg -c >/dev/null > >> >> > umount $SCRATCH_MNT &>/dev/null > >> >> > > >> >> > echo mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}" > >> >> > mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1 > >> >> > mount "${DEV_LIST[0]}" $SCRATCH_MNT > >> >> > > >> >> > echo -n "Writing big files" > >> >> > dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M > >> >> count=1 >/dev/null 2>&1 > >> >> > for ((i = 1; i <= size; i++)); do > >> >> > echo -n . > >> >> > /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return > 1 > >> >> > done > >> >> > echo > >> >> > > >> >> > echo Start replace > >> >> > btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" > >> >> $SCRATCH_MNT || { > >> >> > dmesg > >> >> > return 1 > >> >> > } > >> >> > return 0 > >> >> > } > >> >> > > >> >> > # Set size to value near fs size > >> >> > # for example, 1897 can trigger this bug in 2.6G device. > >> >> > # > >> >> > ./do_test "-d raid1 -m raid1" 1897 > >> >> > > >> >> > System will report replace fail with following warning in dmesg: > >> >> > > >> >> > Reason: > >> >> > When big data writen to fs, the whole free space will be > >> >> > allocated for data chunk. > >> >> > And operation as scrub need to set_block_ro(), and when there > >> >> > is only one metadata chunk in system(or other metadata chunks > >> >> > are all full), the function will try to allocate a new chunk, > >> >> > and failed because no space in device. > >> >> > > >> >> > Fix: > >> >> > When set_block_ro failed for metadata chunk, it is not a > >> >> > problem because scrub_lock forbids commit_trancaction. > >> >> > >> >> Hi Zhao, I'm confused reading this explanation. > >> >> > >> >> Why isn't it a problem if the block group gets modified while the > >> >> transaction is ongoing? Through writepages() for example. > >> >> Committing a transaction isn't the only way to write data or > >> >> metadata to a block group. > >> >> > >> > Metadata chunks always updated in cow, the writepages() will write > >> > new data to different place with operation done by replace. > >> > >> So why do we try to set the block group ro? Your change is really > >> confusing - if we fail setting the block group to read-only mode, we > >> just ignore it and proceed > >> - why bother in the first place to set it read-only? > >> > > I find a problem that set_group_ro will failed for metadata when disk > > almost full, but we also have a good news that the time when > > set_group_ro failed is just when it is not necessary. > > Yes - but the code will ignore the failure for any error code, not just > -ENOSPC. > Surely btrfs_inc_block_group_ro() can return error codes different from > -ENOSPC. You should make it clear in the code (and in the change log) why > ENOSPC is such a special case - why we need to set the block to RO and why we > can ignore the ENOSPC failure. > Thanks for notice. Will make, test and send v2.
Thanks Zhaolei > > > >> Someone reading the code will find it fishy, and going back to git > >> history, your change log doesn't really explains why this is being > >> done and why is it safe to do it. > >> > > Thanks for notice, it is necessary to add detail explanation into changelog. > > > >> You also forgot to paste the warning you mention below "System will > >> report replace fail with following warning in dmesg:" in the change log. > >> > > Agree, I'll paste it in changelog. > > > > Thanks > > Zhaolei > > > >> thanks > >> > >> > > >> > Thanks > >> > Zhaolei > >> > > >> >> thanks > >> >> > >> >> > Let replace continue in this case is no problem. > >> >> > > >> >> > Tested by above script, and xfstests/011, plus 100 times xfstests/070. > >> >> > > >> >> > Signed-off-by: Zhao Lei <zhao...@cn.fujitsu.com> > >> >> > --- > >> >> > fs/btrfs/scrub.c | 9 ++++----- > >> >> > 1 file changed, 4 insertions(+), 5 deletions(-) > >> >> > > >> >> > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index > >> >> > a39f5d1..5a30753 100644 > >> >> > --- a/fs/btrfs/scrub.c > >> >> > +++ b/fs/btrfs/scrub.c > >> >> > @@ -3532,6 +3532,7 @@ int scrub_enumerate_chunks(struct > >> >> > scrub_ctx > >> >> *sctx, > >> >> > u64 length; > >> >> > u64 chunk_offset; > >> >> > int ret = 0; > >> >> > + int ro_set = 0; > >> >> > int slot; > >> >> > struct extent_buffer *l; > >> >> > struct btrfs_key key; > >> >> > @@ -3617,10 +3618,7 @@ int scrub_enumerate_chunks(struct > >> >> > scrub_ctx > >> >> *sctx, > >> >> > scrub_pause_on(fs_info); > >> >> > ret = btrfs_inc_block_group_ro(root, cache); > >> >> > scrub_pause_off(fs_info); > >> >> > - if (ret) { > >> >> > - btrfs_put_block_group(cache); > >> >> > - break; > >> >> > - } > >> >> > + ro_set = !ret; > >> >> > > >> >> > dev_replace->cursor_right = found_key.offset + > >> length; > >> >> > dev_replace->cursor_left = found_key.offset; @@ > >> >> > -3660,7 +3658,8 @@ int scrub_enumerate_chunks(struct scrub_ctx > >> >> > *sctx, > >> >> > > >> >> > scrub_pause_off(fs_info); > >> >> > > >> >> > - btrfs_dec_block_group_ro(root, cache); > >> >> > + if (ro_set) > >> >> > + btrfs_dec_block_group_ro(root, cache); > >> >> > > >> >> > btrfs_put_block_group(cache); > >> >> > if (ret) > >> >> > -- > >> >> > 1.8.5.1 > >> >> > > >> >> > -- > >> >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > >> >> > in the body of a message to majord...@vger.kernel.org More > >> >> > majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> > >> >> > >> >> > >> >> -- > >> >> Filipe David Manana, > >> >> > >> >> "Reasonable men adapt themselves to the world. > >> >> Unreasonable men adapt the world to themselves. > >> >> That's why all progress depends on unreasonable men." > >> > > >> > >> > >> > >> -- > >> Filipe David Manana, > >> > >> "Reasonable men adapt themselves to the world. > >> Unreasonable men adapt the world to themselves. > >> That's why all progress depends on unreasonable men." > >> -- > >> To unsubscribe from this list: send the line "unsubscribe > >> linux-btrfs" in the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html