Re: Large files, nodatacow and fragmentation

G. Richard Bellamy Tue, 02 Sep 2014 11:31:29 -0700

I thought I'd follow-up and give everyone an update, in case anyone
had further interest.


I've rebuilt the RAID10 volume in question with a Samsung 840 Pro for
bcache front device.

It's 5x600GB SAS 15k RPM drives RAID10, with the 512MB SSD bcache.

2014-09-02 11:23:16
root@eanna i /var/lib/libvirt/images # lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda         8:0    0 558.9G  0 disk
└─bcache3 254:3    0 558.9G  0 disk /var/lib/btrfs/data
sdb         8:16   0 558.9G  0 disk
└─bcache2 254:2    0 558.9G  0 disk
sdc         8:32   0 558.9G  0 disk
└─bcache1 254:1    0 558.9G  0 disk
sdd         8:48   0 558.9G  0 disk
└─bcache0 254:0    0 558.9G  0 disk
sde         8:64   0 558.9G  0 disk
└─bcache4 254:4    0 558.9G  0 disk
sdf         8:80   0   1.8T  0 disk
└─sdf1      8:81   0   1.8T  0 part
sdg         8:96   0   477G  0 disk /var/lib/btrfs/system
sdh         8:112  0   477G  0 disk
sdi         8:128  0   477G  0 disk
├─bcache0 254:0    0 558.9G  0 disk
├─bcache1 254:1    0 558.9G  0 disk
├─bcache2 254:2    0 558.9G  0 disk
├─bcache3 254:3    0 558.9G  0 disk /var/lib/btrfs/data
└─bcache4 254:4    0 558.9G  0 disk
sr0        11:0    1  1024M  0 rom

I further split the system and data drives of the VM Win7 guest. It's
very interesting to see the huge level of fragmentation I'm seeing,
even with the help of ordered writes offered by bcache - in other
words while bcache seems to be offering me stability and better
behavior to the guest, the underlying the filesystem is still seeing a
level of fragmentation that has me scratching my head.

That being said, I don't know what would be normal fragmentation of a
VM Win7 guest system drive, so could be I'm just operating in my zone
of ignorance again.

2014-09-01 14:41:19
root@eanna i /var/lib/libvirt/images # filefrag atlas-*
atlas-data.qcow2: 7 extents found
atlas-system.qcow2: 154 extents found
2014-09-01 18:12:27
root@eanna i /var/lib/libvirt/images # filefrag atlas-*
atlas-data.qcow2: 564 extents found
atlas-system.qcow2: 28171 extents found
2014-09-02 08:22:00
root@eanna i /var/lib/libvirt/images # filefrag atlas-*
atlas-data.qcow2: 564 extents found
atlas-system.qcow2: 35281 extents found
2014-09-02 08:44:43
root@eanna i /var/lib/libvirt/images # filefrag atlas-*
atlas-data.qcow2: 564 extents found
atlas-system.qcow2: 37203 extents found
2014-09-02 10:14:32
root@eanna i /var/lib/libvirt/images # filefrag atlas-*
atlas-data.qcow2: 564 extents found
atlas-system.qcow2: 40903 extents found

On Thu, Aug 14, 2014 at 6:05 PM, Chris Murphy <li...@colorremedies.com> wrote:
>
> On Aug 14, 2014, at 5:16 PM, G. Richard Bellamy <rbell...@pteradigm.com> 
> wrote:
>
>> On Thu, Aug 14, 2014 at 11:40 AM, Chris Murphy <li...@colorremedies.com> 
>> wrote:
>>> and there may be a fit for bcache here because you actually would get these 
>>> random writes committed to stable media much faster in that case, and a lot 
>>> of work has been done to make this more reliable than battery backed write 
>>> caches on hardware raid.
>>
>> umph... heard of bcache, but never looked at it or considered it as an
>> option in this scenario. After reading the doco and some of the design
>> documents, it's looking like bcache and md/mdadm or LVM could do the
>> trick.
>
> They are all separate things. I haven't worked with the LVM caching (which 
> uses dm-cache as the backend, similar to how it uses md code on the backend 
> for all of its RAID level support), there could be some advantages there if 
> you have to use LVM anyway, but the design goal of bcache sounds more suited 
> for your workload. And it's got
>
>
>>
>> The gotchas state clearly that btrfs on top of bcache is not recommended.
>
> Yeah I'm not sure if the suggested changes from 3.12 btrfs + bcache problems 
> went through. Eventually they should work together. But I'd use bcache with 
> XFS or ext4 when not in the mood for bleeding something.
>
>>
>> However, can bcache be put 'in front' of a btrfs raid10 volume?
>
> More correctly you will mkfs.btrfs on the bcache devices, which are logical 
> devices made from one or more backing devices, and a cache device.
>
>
> # make-bcache -w 2k -b 512k -C /dev/sdc -B /dev/sd[defg]
> # lsblk
> sdc               8:32   0    8G  0 disk
> ├─bcache0       252:0    0    8G  0 disk
> ├─bcache1       252:1    0    8G  0 disk
> ├─bcache2       252:2    0    8G  0 disk
> └─bcache3       252:3    0    8G  0 disk
> sdd               8:48   0    8G  0 disk
> └─bcache0       252:0    0    8G  0 disk
> sde               8:64   0    8G  0 disk
> └─bcache1       252:1    0    8G  0 disk
> sdf               8:80   0    8G  0 disk
> └─bcache2       252:2    0    8G  0 disk
> sdg               8:96   0    8G  0 disk
> └─bcache3       252:3    0    8G  0 disk
> # mkfs.btrfs -draid10 -mraid10 /dev/bcache[0123]
> # mount /dev/bcache0 /mnt
> # btrfs fi df /mnt
> Data, RAID10: total=2.00GiB, used=27.91MiB
> System, RAID10: total=64.00MiB, used=16.00KiB
> Metadata, RAID10: total=256.00MiB, used=160.00KiB
> GlobalReserve, single: total=16.00MiB, used=0.00B
> [* Yes I cheated and did a balance first so the df output looks cleaner.]
>
>
>> I
>> think not, since btrfs volumes are not presented as individual block
>> devices, instead you've got several block devices (e.g. /dev/sda and
>> /dev/sdb are in a btrfs raid1, and can be seen individually by the
>> OS)... however I wish it could, since bcache "...turns random writes
>> into sequential writes", which solve entirely the problem which
>> prompts the nocow option in btrfs.
>
> Yeah but you've got something perturbing this in the VM guest, and probably 
> also libvirt caching isn't ideal for that workload either. Now it may be 
> safe, but at the expense of being chatty. I'm not yet convinced you avoid 
> this problem with XFS, in which case you're in a better position to safely 
> use bcache.
>
>
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Large files, nodatacow and fragmentation

Reply via email to