On Tue, Sep 22, 2015 at 08:59:57AM -0400, Jeff Mahoney wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
[snip]
> So if they way we want to prevent the loss of raid type info is by
> maintaining the last block group allocated with that raid type, fine,
> but that's a separate discussion.  Personally, I think keeping 1GB
> allocated as a placeholder is a bit much.  Beyond that, I've been
> thinking casually about ways to direct the allocator to use certain
> devices for certain things (e.g. in a hybrid system with SSDs and
> HDDs, always allocate metadata on the SSD) and there's some overlap
> there.  As it stands, we can fake that in mkfs but it'll get stomped
> by balance nearly immediately.

   In terms of selecting the location of chunks within the allocator,
I wrote up a design for a pretty generic way of doing it some time ago
[1]. It would allow things like metadata to SSDs, but also defining
failure domains for replication (i.e. "I want two copies of my data in
RAID-1, but I want each copy to go on a different storage array"). It
would also give us the ability to handle different allocation
strategies, such as filling up one device at a time.

   I got as far as some python to demonstrate the algorithms and
structure (also in that mail thread). I started trying to work out how
to rewrite the allocator in the kernel to support it, but I got lost
in the code fairly rapidly, particularly about how to store the
relevant policy metadata (for the FS as a whole, and, later, on a
per-subvolume basis).

   Hugo.

[1] http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg33499.html

> - -Jeff
> 
> > If we delete all blockgroup for a raidtype, it not only cause above
> > bug, but also may change filesystem to all-single in some case.
> > 
> > Test: Test by above script, and confirmed the logic by debug
> > output.
> > 
> > Signed-off-by: Zhao Lei <zhao...@cn.fujitsu.com> --- 
> > fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1
> > deletion(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index
> > 5411f0a..35cf7eb 100644 --- a/fs/btrfs/extent-tree.c +++
> > b/fs/btrfs/extent-tree.c @@ -10012,7 +10012,8 @@ void
> > btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) bg_list); 
> > space_info = block_group->space_info; 
> > list_del_init(&block_group->bg_list); -             if (ret ||
> > btrfs_mixed_space_info(space_info)) { +             if (ret ||
> > btrfs_mixed_space_info(space_info) || +                 
> > block_group->list.next
> > == block_group->list.prev) { btrfs_put_block_group(block_group); 
> > continue; }
> > 
> 
> 
> - -- 
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
> 
> iQIcBAEBAgAGBQJWAVDNAAoJEB57S2MheeWyCwoQAId9IK0vYX01W20SeLt5E5ql
> cabIeN3JCLcmtEbJzhNxQtcjvB7Rgq/r3BRDV0n0Z71dyv8WV8vau4Qka8xUVtLL
> l+sbuRIEBUR3UHOvqjV7MxSZeZrQZLWeGuCRH9El059hDn/JFsF9n3wJx8YsgXKe
> dma2RG6MHFVXY08jYkLc6nexBbYlc3Dj2jbd2Jr7gHy4YwFTCM9YncR+STV2K47Q
> N/pfRwiVHFHHVTju5lg3wzp+xvFPeU52cfWHL05axe8l75pU6Ywwrk406QxyrTvx
> 2Rh8tXBJItUeMA/D8mRnwWVZBWFUndl6JlBNSyf51fSP+4lPkChbM5UnSOjDOwvE
> E7XpGy31TQI0bqpy8qoIkI9wkek6iOlMCppZ9U2vICbeP+65WtNZKfQcCO0t6Z1H
> 6IqfHsaDvvaiorxEWWIarsIfHZWnWJeav545t6pd4VU3v52YQN2YIOLY8EhWv4Wt
> 90Xc1izPvPvnyQa3eQPg1ISdqNfJRFlYjSJ75zGvSPurIy77oOyvPa1EfEO7IMys
> zXyjgKzU6Yox1iXxeJsDxuAa+FX9P2rXqd8WYP2mBRqH2BE6D+R8V/NitGmXSkYA
> bBXN1H/m+gP5qhHLnBQZU+ABH1dDp6RJ1BCsg7iDJBmfE+hJI8YIwowwH/C0RBST
> 1HgsAUWHmDsjHcYr3/ZB
> =Li+/
> -----END PGP SIGNATURE-----

-- 
Hugo Mills             | "Big data" doesn't just mean increasing the font
hugo@... carfax.org.uk | size.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

Attachment: signature.asc
Description: Digital signature

Reply via email to