Re: btrfs und lvm-cache?

2015-12-23 Thread Martin Steigerwald
Am Mittwoch, 23. Dezember 2015, 11:45:28 CET schrieb Neuer User:
> Hello

Hi.

> I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
> RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.
> 
> The server will be used to host a (small) number of virtual machines,
> most of them being LXC containers, few being KVM machines. One of the
> LXC containers will host a fileserver with app 1 TB of data and another
> one a backup system for the desktops / laptops in my household, thus
> probably holding quite a lot of files. The lxc containers will use the
> filesystem of the proxmox host, the KVM machines probably raw disk files
> (or qcow2).
> 
> I would like to combine high data integrity with some speed, so I
> thought of the following layout:
> 
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
> 
> Now, I wonder if that is a good architecture to go for. Any input on
> that? Is btrfs the right way to go for, or should I better go for ZFS
> (and purchase some more gigs of RAM)?
> 
> Will there be any problems arising from the lvmcache? btrfs only sees
> the HDDs, LVM does the SDD handling.

As far as I understand this way you basically loose the RAID 1 semantics of 
BTRFS. While the data is redundant on the HDDs, it is not redundant on the 
SSD. It may work for a pure read cache, but for write-through you definately 
loose any data integrity protection a RAID 1 gives you.

Of course, you can use two SSDs and have them work as RAID 1 as well.

There is a patch set for in-BTRFS SSD-caching. It consists of a patch set to 
add hot data tracking to VFS and a patch set for adding support in BTRFS. But 
I didn´t see anything of these in quite some time.

Happy christmas,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Neuer User
Am 23.12.2015 um 12:21 schrieb Martin Steigerwald:
> Hi.
> 
> As far as I understand this way you basically loose the RAID 1 semantics of 
> BTRFS. While the data is redundant on the HDDs, it is not redundant on the 
> SSD. It may work for a pure read cache, but for write-through you definately 
> loose any data integrity protection a RAID 1 gives you.
> 
Hmm, are you sure? I thought LVM lies underneath btrfs. Btrfs thus
should not know about the caching SSD at all. It only knows of the two
LVs on the HDDs, reading and writing data from or to one or both of the
two LVs.

Only then lvmcache decides if it reads the data from the underlying HDD
or from the cache ssd. LVM shouldn't even know that the two LVs are
configured as RAID1 on btrfs as this is a level higher. So for LVM the
two LVs are diffeent data, both of which would need to be cached
independently on the SDD.

What might happen though, is that there is a data loss on the SDD,
returning a mismatching checksum, so btrfs might think that the data is
incorrect on one LV (=HDD), although it is indeed correct there. That
would lead btrfs to read the data from the second LV (which might also
be in the SDD cache or not) and then updating the (correct and
identical) data of the first LV with it.

Or do I see that wrong?

> Of course, you can use two SSDs and have them work as RAID 1 as well.
> 
> There is a patch set for in-BTRFS SSD-caching. It consists of a patch set to 
> add hot data tracking to VFS and a patch set for adding support in BTRFS. But 
> I didn´t see anything of these in quite some time.

That would be interesting, but for my project it's probably too late.

> 
> Happy christmas,
> 

Yeah, happy christmas to you and eveybody on the list.

Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Loss of connection to Half of the drives

2015-12-23 Thread Donald Pearson
On Tue, Dec 22, 2015 at 10:13 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Donald Pearson posted on Tue, 22 Dec 2015 17:56:29 -0600 as excerpted:
>
>
>>> Also understand with Brfs RAID 10 you can't lose more than 1 drive
>>> reliably. It's not like a strict raid1+0 where you can lose all of the
>>> "copy 1" *OR* "copy 2" mirrors.
>>
>> Pardon my pea brain but this sounds like a pretty bad design flaw?
>
> It's not a design flaw, it's EUNIMPLEMENTED.  Btrfs raid1, unlike say
> mdraid1 (and now various hardware raid vendors), implements exactly two
> copy raid1 -- each chunk is mirrored to exactly two devices.  And btrfs
> raid10, because it builds on btrfs raid1, is likewise exactly two copies.
>
> With raid1 on two devices, where those two copies go is defined, one to
> each device.  With raid1 on more than two devices, the current chunk-
> allocator will allocate one copy each to the two devices with the most
> free space left, so that if the devices are all the same size, they'll
> all be used to about the same level and will run out of space at about
> the same time.  (If they're not the same size, with one much larger than
> the others, it'll get one copy all the time, with the other copy going to
> the second largest or to each in turn once remaining empty sizes even
> out.)
>
> Similarly with raid10, except each strip is two-way mirrored and a stripe
> created of the mirrors.
>
> And because the raid is managed and allocated per-chunk, drop more than a
> single device, and it's very likely you _will_ be dropping both copies of
> _some_ chunks on raid1, and some strips of chunks on raid10, making them
> entirely unavailable.
>
> In that case you _might_ be able to mount degraded,ro, but you won't be
> able to mount writable.
>
> The other btrfs-only alternative at this point would be btrfs raid6,
> which should let you drop TWO devices before data is simply missing and
> unrecreatable from parity.  But btrfs raid6 is far newer and less mature
> than either raid1 or raid10, and running the truly latest versions is
> very strongly recommended upto v4.4 or so, which is actually soon to be
> released now, as older versions WILL quite likely have issues.  As it
> happens, kernel v4.4 is an LTS series, so the timing for btrfs raid5 and
> raid6 there is quite nice, as 4.4 should see them finally reasonably
> stable, and being LTS, should continue to be supported for quite some
> time.
>
> (The current btrfs list recommendation in general is to stay within two
> LTS versions in ordered to avoid getting /too/ far behind, as while
> stabilizing, btrfs isn't entirely stable and mature yet, and further back
> then that simply gets unrealistic to support very well.  That's 3.18 and
> 4.1 currently, with 3.18 being soon to drop as 4.4 is soon to release as
> the next LTS.  But as btrfs stabilizes further, it's somewhat likely that
> 4.1 or at least 4.4, will continue to be reasonably supported beyond the
> second LTS back phase, perhaps to the third, and sometime after that,
> support will probably last more or less as long as the LTS stable branch
> continues getting updates.)
>
> But even btrfs raid6 only lets you drop two devices before general data
> loss occurs.
>
> The other alternative, as regularly used and recommended by one regular
> poster here, would be btrfs raid1 on top of mdraid0 or possibly mdraid10
> or whatever.  The same general principle would apply to btrfs raid5 and
> raid6 as they mature, on top of mdraidN, with the important point being
> that the btrfs level has redundancy, raid1/10/5/6, since it has real-time
> data and metadata checksumming and integrity management features that are
> lacking in mdraid.  By putting the btrfs raid with either redundancy or
> parity on top, you get the benefit of actual error recovery that would be
> lacking if it was btrfs raid0 on top.
>
> That would let you manage loss of one entire set of the underlying mdraid
> devices, one copy of the overlying btrfs raid1/10 or one strip/parity of
> btrfs raid5, which could then be rebuilt from the other two, while
> maintaining btrfs data and metadata integrity as one copy (or stripe-
> minus-one-plus-one-parity) would always exist.  With btrfs raid6, it
> would of course let you lose two of the underlying sets of devices
> composing the btrfs raid6.
>
> In the precise scenario the OP posted, that would work well, since in the
> huge numbers of devices going offline case, it'd always be complete sets
> of devices, corresponding to one of the underlying mdraidNs, because the
> scenario is that set getting unplugged or whatever.
>
> Of course in the more general random N devices going offline case, with
> the N devices coming from any of the underlying mdraidNs, it could still
> result in not all data being available to the btrfs raid level, but
> except for mdraid0, the chances of it happening are still relatively low,
> and with mdraid0, they're still within reason, if not /as/ low.  But that
> general scenario isn't 

Re: Loss of connection to Half of the drives

2015-12-23 Thread Goffredo Baroncelli
On 2015-12-23 16:53, Donald Pearson wrote:
[...]
> 
> Additionally real Raid10 will run circles around what BTRFS is doing
> in terms of performance.  In the 20 drive array you're striping across
> 10 drives, in BTRFS right now you're striping across 2 no matter what.
> So not only do I lose in terms of resilience I lose in terms of
> performance.  I assume that N-way-mirroring used with BTRFS Raid10
> will also increase the stripe width so that will level out the
> performance but you're always going to be short a drive for equal
> resilience.

In case of RAID10,on the best of my knowledge, BTRFS allocate each CHUNK across 
*all* the available devices. It uses the usual RAID0 (==striping) over a RAID1 
(mirroring).

What you are describing is the BTRFS RAID1; i.e. LINEAR over a RAID1:each chunk 
is allocated in *two*, only *two* different disks from the disks pool; the 
disks are the ones with the largest free space. Each chunk may be allocated on 
a different *pair* of disks.

> And finally the elephant in the room that comes with the necessary
> 11-way mirroring is that the usable capacity of that 20 drive array.
> Remember, pea brain so my math may be wrong in application and
> calculation but if it's made of 1T drives for 20T raw, there is only
> 1.82T usable (20 / 11) and if I'm completely off in that figure the
> point is still that such a high level of mirroring is going to
> excessively consume drive space.

Ducan talked about a N-way mirroring, where each disks contains a copy of the 
same data. Nobody talked about N-way mirroring where N is less than the number 
of the available disks.

To be honest in the past appeared some patches to implement a generalized 
RAID-NxM raid, where N are the total disk, M are the redundancy disks: i.e. the 
filesystem could allow a drop of M disks (see 
http://www.spinics.net/lists/linux-btrfs/msg29245.html).

BR
G.Baroncelli


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Chris Murphy
On Wed, Dec 23, 2015 at 4:38 AM, Neuer User  wrote:
> Am 23.12.2015 um 12:21 schrieb Martin Steigerwald:
>> Hi.
>>
>> As far as I understand this way you basically loose the RAID 1 semantics of
>> BTRFS. While the data is redundant on the HDDs, it is not redundant on the
>> SSD. It may work for a pure read cache, but for write-through you definately
>> loose any data integrity protection a RAID 1 gives you.
>>
> Hmm, are you sure? I thought LVM lies underneath btrfs. Btrfs thus
> should not know about the caching SSD at all. It only knows of the two
> LVs on the HDDs, reading and writing data from or to one or both of the
> two LVs.
>
> Only then lvmcache decides if it reads the data from the underlying HDD
> or from the cache ssd. LVM shouldn't even know that the two LVs are
> configured as RAID1 on btrfs as this is a level higher. So for LVM the
> two LVs are diffeent data, both of which would need to be cached
> independently on the SDD.
>
> What might happen though, is that there is a data loss on the SDD,
> returning a mismatching checksum, so btrfs might think that the data is
> incorrect on one LV (=HDD), although it is indeed correct there. That
> would lead btrfs to read the data from the second LV (which might also
> be in the SDD cache or not) and then updating the (correct and
> identical) data of the first LV with it.

Seems to me if the LV's on the two HDDs are exposed, the lvmcache has
to separately keep track of those LVs. So as long as everything is
working correctly, it should be fine. That includes either transient
or persistent, but consistent, errors for either HDD or the SSD, and
Btrfs can fix up those bad reads with data from the other. If the SSD
were to decide to go nutty, chances are reads through lvmcache would
be corrupt no matter what LV is being read by Btrfs, and it'll be
aware of that and discard those reads. Any corrupt writes in this
case, won't be immediately known by Btrfs because it (like any file
system) assumes writes are OK unless the device reports a write
failure, but those too would be found on read.

The question I have, that I don't know the answer to, is if the stack
arrives at a point where all writes are corrupt but hardware isn't
reporting write errors, and it continues to happen for a while, once
you've resolved that problem and try to mount the file system again,
how well does Btrfs disregard all those bad writes? How well would any
filesystem?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Neuer User
One other thing:

I read that btrfs has some options that are turned off for SSDs as they
might be harmful or so. In my case btrfs, however, would not know about
the SSD and probably use its HDD optimized settings. The result,
however, would be forwared also to the SSD via lvmcache. Do I see that
right? Would that give any serious problems?

Am 23.12.2015 um 11:45 schrieb Neuer User:
> Hello
> 
> I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
> RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.
> 
> The server will be used to host a (small) number of virtual machines,
> most of them being LXC containers, few being KVM machines. One of the
> LXC containers will host a fileserver with app 1 TB of data and another
> one a backup system for the desktops / laptops in my household, thus
> probably holding quite a lot of files. The lxc containers will use the
> filesystem of the proxmox host, the KVM machines probably raw disk files
> (or qcow2).
> 
> I would like to combine high data integrity with some speed, so I
> thought of the following layout:
> 
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
> 
> Now, I wonder if that is a good architecture to go for. Any input on
> that? Is btrfs the right way to go for, or should I better go for ZFS
> (and purchase some more gigs of RAM)?
> 
> Will there be any problems arising from the lvmcache? btrfs only sees
> the HDDs, LVM does the SDD handling.
> 
> Thanks for any input. I like btrfs very much, but data integrity is
> important for this.
> 
> Michael
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Chris Murphy
On Wed, Dec 23, 2015 at 1:21 PM, Neuer User  wrote:
> Am 23.12.2015 um 20:49 schrieb Chris Murphy:
>> Seems to me if the LV's on the two HDDs are exposed, the lvmcache has
>> to separately keep track of those LVs. So as long as everything is
>> working correctly, it should be fine. That includes either transient
>> or persistent, but consistent, errors for either HDD or the SSD, and
>> Btrfs can fix up those bad reads with data from the other. If the SSD
>> were to decide to go nutty, chances are reads through lvmcache would
>> be corrupt no matter what LV is being read by Btrfs, and it'll be
>> aware of that and discard those reads. Any corrupt writes in this
>> case, won't be immediately known by Btrfs because it (like any file
>> system) assumes writes are OK unless the device reports a write
>> failure, but those too would be found on read.
>
> What corrupt write do you mean? The "nuts" SSD is not going to write to
> the HDDs, that will be done by lvmcache. So the HDDs should get the
> correct data, only the SSD will be bad, right?

Btrfs always writes to the 'cache LV' and then it's up to lvmcache to
determine how and when things are written to the 'cache pool LV' vs
the 'origin LV' and I have no idea if there's a case with writeback
mode where things write to the SSD and only later get copied from SSD
to the HDD, in which case a wildly misbehaving SSD might corrupt data
on the origin.

If you use writethrough, the default, then the data on HDDs should be
fine even if the single SSD goes crazy for some reason. Even if all
reads go bad, worse case is Btrfs should stop and go read-only. If the
SSD read errors are more transient, then Btrfs tries to fix them with
COW writes, so even if these fixes aren't needed on HDD, they should
arrive safely on both HDDs and hence still no corruption.

I mean *really* if data integrity is paramount you probably would do
this with production methods. Anything that has high IOPS like a mail
server, just write that stuff only to the SSD, and then occasionally
rsync it to conventionally raided (md or lvm) HDDs with XFS. You could
even use lvm snapshots and do this often, and now you not only have
something fast and safe but also you have an integrated backup that's
mirrored, in a sense you have three copies. Whereas what you're
attempting is rather complicated, and while it ought to work and it
gets testing, you're really being a test candidate not least of which
is Btrfs but also lvmcache, but you're also combining both tests. I'd
just say make sure you have regular backups - snapshot the rw
subvolume regularly and sync it to another filesystem. As often as the
workflow can tolerate.




>
> And that would become obvious with the next reads, in which case btrfs
> probably would throw an error as it gets crazy data from apparently both
> LVs (but only coming from the SSD). So, that could be fixed by removing
> the SSD without any data loss from the HDDs, right?

Only if you're using writethrough mode, but yes.


>
>>
>> The question I have, that I don't know the answer to, is if the stack
>> arrives at a point where all writes are corrupt but hardware isn't
>> reporting write errors, and it continues to happen for a while, once
>> you've resolved that problem and try to mount the file system again,
>> how well does Btrfs disregard all those bad writes? How well would any
>> filesystem?
>>
> Hmm, again the writes to the HDDs should be ok. Only the SSD would have
> pretty corrupt data, right? In such a case it might depend on how much
> bad data is read back from the SSDs and what the filesystem does in
> raction to these?
>
> P.S.: Of course, one other possibility would be to use a second SSD, so
> that each LV has a separate caching SSD. In this case, there would
> always be a valid source (given that not both SSDs go nuts the same
> time...).

Simplistically, SSDs seem to fail two ways: a series of transient
errors that Btrfs can pretty much always account for; and then totally
face planting. The way they faceplant can be all writes fail, reads
work, or the whole device just vanishes off the bus. I don't know how
that affects lvmcache writethrough if the entire cache pool vanishes.
It should still write to the HDDs but I don't know that it does.


> But I would need another slot for this. If the pros are very high,
> that's ok. If it works nicely with just one SSD, then even better.

Yeah if it's a decent name brand SSD and not one of the ones with
known crap firmware, then I think it's fine to just have one. Either
way, each origin LV gets a separate cache pool LV if I understand
lvmcache correctly.

Off hand I don't know if you need separate VGs to make sure the 'cache
LVs' you format with Btrfs in fact use different PVs as origins.
That's important. The usual lvcreate command has a way to specify one
or more PVs to use, rather than have it just grab a pile of extents
from the VG (which could be from either PV), but I don't know if
that's the way 

Re: [PATCH 1/2] fstests: fix btrfs test failures after commit 27d077ec0bda

2015-12-23 Thread Dave Chinner
On Tue, Dec 22, 2015 at 02:22:40AM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Commit 27d077ec0bda (common: use mount/umount helpers everywhere) made
> a few btrfs test fail for 2 different reasons:
> 
> 1) Some tests (btrfs/029 and btrfs/031) use $SCRATCH_MNT as a mount
>point for some subvolume created in $TEST_DEV, therefore calling
>_scratch_unmount does not work as it passes $SCRATCH_DEV as the
>argument to the umount program. This is intentional to test reflinks
>accross different mountpoints of the same filesystem but for different
>subvolumes;

The correct way to fix this is to stop abusing $SCRATCH_MNT and
to instead use a local mount point on the test device

> 2) For multiple devices filesystems (btrfs/003 and btrfs/011) that test
>the device replace feature, we need to unmount using the mount path
>($SCRATCH_MNT) because unmounting using one of the devices as an
>argument ($SCRATCH_DEV) does not always work - after replace operations
>we get in /proc/mounts a device other than $SCRATCH_DEV associated
>with the mount point $SCRATCH_MNT (this is mentioned in a comment at
>btrfs/011 for example), so we need to pass that other device to the
>umount program or pass it the mount point.

Which says to that _scratch_unmount should be using $SCRATCH_MNT
rather than $SCRATCH_DEV. That would fix the problem without needing
to modify any of the tests, right?

> Using $SCRATCH_MNT as a mountpoint for a device other than $SCRATCH_DEV is
> misleading, but that's a different problem that existed long before and
> this change attempts only to fix the regression from 27d077ec0bda.

It may be misleading, but that's the fundamental problem that needs
fixing.  As always, we should be trying to fix the root cause of the
problem, not working around them...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Loss of connection to Half of the drives

2015-12-23 Thread Duncan
Goffredo Baroncelli posted on Wed, 23 Dec 2015 19:20:32 +0100 as
excerpted:

> Ducan talked about a N-way mirroring, where each disks contains a copy
> of the same data. Nobody talked about N-way mirroring where N is less
> than the number of the available disks.

Well, to be fair, I did /try/ to talk about raid10 in the context of N-
way-mirroring, as *one*future*option*, which would let you do say 3-way-
mirroring, 2-way-striping, using six devices, giving you that choice in 
addition to the current 3-way-striping, 2-way-mirroring, that's the only 
current choice for btrfs raid10 with six devices, since it's limited to 
two-way-mirroring.

But obviously I was more confusing than clear, since you apparently 
didn't see that bit at all, and he saw it, but apparently ended up more 
confused than helped by it, possibly due to trying to apply that 
discussion to a larger scope than the limited one-future-option scope 
that I had originally intended.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5/6 Stability

2015-12-23 Thread Chris Murphy
There's a worthwhile distinction between stability of raid56 vs all
other profiles, and btrfs multiple device failure behavior. Right now
there's no monitoring or notification of failures to user space. In
fact Btrfs itself doesn't really understand device failures, a device
can spit out many read or write errors and Btrfs keeps trying to read
and write. So there's no equivalent to faultiness like with md/mdadm.
Therefore you'll have to figure out a way to monitor kernel messages,
maybe via a script that parses for btrfs messages and emails any such
messages ever 10m or whatever.

Chris Murphy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5/6 Stability

2015-12-23 Thread Duncan
Chris Murphy posted on Wed, 23 Dec 2015 19:38:23 -0700 as excerpted:

> There's a worthwhile distinction between stability of raid56 vs all
> other profiles, and btrfs multiple device failure behavior. Right now
> there's no monitoring or notification of failures to user space. In
> fact Btrfs itself doesn't really understand device failures, a device
> can spit out many read or write errors and Btrfs keeps trying to read
> and write. So there's no equivalent to faultiness like with md/mdadm.
> Therefore you'll have to figure out a way to monitor kernel messages,
> maybe via a script that parses for btrfs messages and emails any such
> messages ever 10m or whatever.

Absolutely.  Raid56 mode may be stabilizing, but there's still no user-
side multi-device filesystem health monitoring application, either for 
raid56 or in general, for the raid1/10 modes which are in fact reasonably 
stable and mature on btrfs and have been considered at the level of btrfs 
itself for quite awhile (several years), now.

Thanks for that addendum, Chris.  It could be quite helpful to someone 
just setting up a new installation, particularly on a server where the 
user and/or admin is unlikely to be directly observing things and thus 
know when things go wrong due to the observed change in behavior, 
regardless of formal monitoring or the lack thereof, as would likely be 
the case on a desktop/workstation.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Duncan
Neuer User posted on Wed, 23 Dec 2015 11:45:28 +0100 as excerpted:

> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible

I'll let others debate the lvm-cache details, which I don't know much
about, but I do have a couple points to add, one of which is detail,
one rather higher level.  The higher level one first:

1) While I've seen both bcache and lvm-cache discussed as potential
options here, there is at least one user using bcache on top of btrfs
that posts to bcache-related threads here with some regularity.
While there were some serious bugs to work thru early on, his
recent posts suggest current bcache works very well with current
btrfs, and given that he has posted to several threads with some
time separation between them, he does appear to be a regular here,
and I expect he'd be posting pretty fast if things started going
buggy for him once again.

There hasn't been a corresponding regular poster here using lvm-cache,
so while it may work well, we don't know that.  At minimum, postings
thus suggest that bcache on btrfs is a better tested solution at
this point, and thus, would be recommended, while lvm-cache on btrfs,
while an equally valid technical choice in theory, doesn't have much
if any real-world data going for it at this point, and is thus
in practice an unknown.

2) Not being the person using bcache and not being familiar with it
or lvm-cache personally, I don't know how either one handle btrfs
multi-device.  However, it occurs to me that if it's necessary,
in addition to the multiple ssds suggested by the others to cover
such multi-device caching, you should also be able to partition
up the ssd, and use each partition as an individual device cache.
That's almost certainly what I'd do here if I needed to (except
that above a certain size, ssd prices per GiB start to go up
dramatically, so if I wanted total ssd cache sizes above that I'd
of course pay less for multiple smaller ssds again) instead of
fiddling with multiple physical ssds, but again, not knowing
how the caching works, I'm not sure if multiple cache devices
would be needed to cache a multi-device btrfs at the back end,
or not, so I don't know whether I'd need to bother with such
partitioning or not.

The key here is that on ssds, seek time is zero anyway, so
partitioning up the ssd and using both partitions as cache
doesn't have the latency issues that attempting to do something
like that (or for example btrfs raid1 on two partitions on the
same physical device) would have on spinning rust.


I thought I'd throw those points out, in case you had failed to
notice bcache as an option and would prefer it as better tested,
once you knew about it, and in case the partitioned ssd idea
does help with the multi-device btrfs caching thing.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Loss of connection to Half of the drives

2015-12-23 Thread Duncan
Donald Pearson posted on Wed, 23 Dec 2015 09:53:41 -0600 as excerpted:

> Additionally real Raid10 will run circles around what BTRFS is doing in
> terms of performance.  In the 20 drive array you're striping across 10
> drives, in BTRFS right now you're striping across 2 no matter what. So
> not only do I lose in terms of resilience I lose in terms of
> performance.  I assume that N-way-mirroring used with BTRFS Raid10 will
> also increase the stripe width so that will level out the performance
> but you're always going to be short a drive for equal resilience.

No, with btrfs raid10, you're /mirroring/ across two drives no matter 
what.  With 20 devices, you're /striping/ across 10 two-way mirrors.  
It's the same as a standard raid10, in that regard.  

Tho it's a bit different in that the mix of devices forming the above can 
differ among different chunks.  IOW, the first chunk might be mirrored a/
b c/d e/f g/h i/j k/l m/n o/p q/r s/t, with the stripe across each mirror-
pair, but the chunk might be mirrored a/l g/o f/k b/n c/d e/s j/q h/t i/p 
m/r (I think I got each letter once...), and striped across those pairs.

So you get the same performance as a normal raid10 (well, to the extent 
that btrfs has been optimized, which in large part it hasn't been, yet), 
but as should always be the case in a raid10, randomized loss of more 
than a single device can mean data loss.

But, because each chunk pair assignment is more or less randomized, 
unlike a conventional raid10 which lets you map all of one mirror set to 
one cabinet and all of the second mirror set to another cabinet, so you 
can reliably lose an entire cabinet and be fine since it's known to 
correspond exactly to a single mirror set, you can't do that with btrfs 
raid10, because there's no way to specify individual chunk mirroring and 
what might be precisely one mirror set with one chunk, is very likely to 
be both copies of some mirrors and no copies of other mirrors, with 
another chunk.

What I was suggesting as a solution was a setup that:
(a) has btrfs raid1 at the top level
(b) has a pair of mdraidNs underneath, in this case a pair of 10-device 
mdraid10s.
(c) has the pair of mdraidNs each presented to btrfs as one of its raid1 
mirrors.

While this is actually raid01, not raid10, in this case it makes more 
sense than a mixed raid10, because by doing it that way, you'd:
1) keep btrfs' data integrity and error correction at the top level, as 
it could pull from the second copy if the first failed checksum.
2) be able to stick each mdraid0 in its own cabinet, so loss of the 
entire cabinet wouldn't be data loss, only redundancy loss.

(Reversing that, btrfs raid0 on top of mdraid1, would lose btrfs' ability 
to correct checksum errors as at the btrfs level, it'd be non-redundant, 
and mdraid1 doesn't have checksumming, so it couldn't provide the same 
data integrity service.  Without checksumming and pull from the other 
copy in case of error, you could scrub the mdraid1 to make its mirrors 
identical again, but you'd be just as likely to copy the bad one to the 
good one as the reverse.  Thus, btrfs really needs to be the raid1 layer 
unless you simply don't care about data integrity, and because btrfs is 
the filesystem layer, it has to be the top layer, so you're left doing a 
raid01 instead of the raid10 that's ordinarily preferred due to locality 
of a rebuild, absent other factors like this data integrity factor.)

And what btrfs N-way-mirroring will provide, in the longer term once 
btrfs gets that feature and it stabilizes to usability, is the ability to 
actually have three cabinets, and sustain the loss of two, or four 
cabinets, and sustain the loss of three, etc.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5/6 Stability

2015-12-23 Thread Duncan
jwalmer posted on Wed, 23 Dec 2015 17:52:10 -0500 as excerpted:

> Just an avid follower of the project checking in. It has been about nine
> months since the initial Raid 5/6 features were released in 3.19 and
> they are still listed as incomplete/experimental on the Wiki.
> 
> Admittedly, I don't understand how such a large and distributed project
> prioritizes features for development, but I haven't been able to find a
> clear roadmap anywhere.
> 
> I'm wondering if anyone here is able to give me some insight about when
> the Raid 5/6 feature will next be updated, or even when they are
> scheduled to lose their incomplete/experimental designation.

Addressing the wiki side first, then the question you're probably more 
interested in. =:^)

FWIW, the wiki gets updated... when a volunteer (which could be you =:^) 
updates it.  It often has quite current information... somewhere on the 
wiki, but often not all mentions of a feature get updated at the same 
time, and some may lag behind.

That said, while btrfs raid56 is no longer experimental, I'd not call it 
entirely stable, even to the point of the rest of btrfs (which is 
stabilizing but not fully stable or mature yet), just yet.

I've personally long stated that raid56 feature stability, to the point 
of the rest of btrfs anyway, can be expected roughly a year after nominal 
feature completion, with an additional requirement of at least two kernel 
cycles without major bugs in the feature.  At five kernel releases a year 
that would put it more or less at 4.4, which is soon to be released and 
quite good timing, as 4.4 is an LTS release, and indeed, the last major 
raid56 bug was fixed early in the 4.2 cycle (well before 4.2 release), so 
4.4 meets the requirement in that regard as well. =:^)

Now I'm just an active list regular and btrfs user, not a dev, but I 
began making that recommendation/prediction before 3.19's release, when 
it was clear 3.19 would bring nominal raid56 code completion, and in the 
immediately following releases as well, when people were (I thought) 
jumping the gun, and indeed, getting their data eaten by remaining 
critical bugs, and nobody has argued it otherwise in the intervening 
time, so I'd suggest it's a reasonably solid recommendation. 

So 4.4 is what I'd consider the magical raid56-stability release, and I'd 
actually expect the wiki to be updated shortly thereafter, tho 4.4 is 
close enough now, and there have been no major raid56 bugs reported in 
the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status could be 
updated now to reflect that.

(Personally, I'm more a newsgroups and mailing lists guy, and while I 
read web/wiki resources and will in fact often quote them, I tend to 
treat them as read-only and very seldom personally edit them, leaving 
that to others, who occasionally even quote my list posts more or less 
verbatim when they update the wiki.  So again, you're invited to do so if 
that's your thing, but it's nothing I'm likely to do personally.  And 
FWIW, there are a few folks that watch wiki updates and revert spam and 
anything crazy, so as long as the edits are honestly trying to make 
things better, any help you can be in editing the wiki is highly 
appreciated, and you don't have to worry too much about any mistakes you 
inadvertently make, as others will be along to fix them. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Holger Hoffstätte
On 12/23/15 21:07, Neuer User wrote:
> Understood. However, do SSDs really do automatic deduplication? I might
> be completely wrong here, but that sounds to be a rather complex
> mechanism, requiring lots of RAM to deduplicate 100 GB. I wouldn't have
> thought that typical SSDs include that?

tl;dr: no, because delta encoding/write buffer coalescing is not dedupe.

This is one of those persistent myth that has been kept alive by the
internet rumor machine. It has its roots in a series of blog articles [1]
and turned out to be panic coupled with FUD and fueled by a lack of factual
information.

I suggest everyone read the article(s), ALL the comments and then get back
to drinking. :o)

In SSD arrays dedupe is generally seen as a good thing.

-h

[1] http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Loss of connection to Half of the drives

2015-12-23 Thread Chris Murphy
On Wed, Dec 23, 2015 at 3:15 PM, Donald Pearson
 wrote:
> On Wed, Dec 23, 2015 at 12:20 PM, Goffredo Baroncelli

>> Ducan talked about a N-way mirroring, where each disks contains a copy of 
>> the same data. Nobody talked about N-way mirroring where N is less than the 
>> number of the available disks.
>>
>
> Well that was certainly implied as the unimplemented solution to
> dropping half the drives that the OP tested.  N-way mirroring where N
> = the number of drives is just Raid1 on crack and not the Raid10
> use-case that the OP is asking about.

How does the OP's use case normally get implemented? For separate
controllers, this would need to be software raid10, but you'd need a
way to specify the drive pairings. How does mdadm create -l raid10
enable that? Or to make absolutely certain, do you put them all in a
container and then first create -l raid1, and then second create -l
raid0?

In any case, what you get is drive level granularity for mirroring. A
drive has an exact (excluding layout options, but still data exact)
copy. That's not true with Btrfs where the granularity is the data
chunk (1+GiB). A given drive's chunks will definitely have copies on
multiple drives rather than on a single drive. And those multiple
drives will variably be on both sides of a controller or drive
make/model division.

One of the major differences of Btrfs with all profiles is that it
deals with different sized devices elegantly. That's because of the
chunk level granularity.

So I think that having mirrors of drives rather than chunks means that
we have to have exact size drive pairings.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas on unified real-ro mount option across all filesystems

2015-12-23 Thread Stewart Smith
Eric Sandeen  writes:
>> 3) A lot of user even don't now mount ro can still modify device
>>Yes, I didn't know this point until I checked the log replay code of
>>btrfs.
>>Adding such mount option alias may raise some attention of users.
>
> Given that nothing in the documentation implies that the block device itself
> must remain unchanged on a read-only mount, I don't see any problem which
> needs fixing.  MS_RDONLY rejects user IO; that's all.
>
> If you want to be sure your block device rejects all IO for forensics or
> what have you, I'd suggest # blockdev --setro /dev/whatever prior to mount,
> and take it out of the filesystem's control.  Or better yet, making an
> image and not touching the original.

What we do for the petitboot bootloader in POWER and OpenPower firmware
(a linux+initramfs that does kexec to boot) is that we use device mapper
to make a snapshot in memory where we run recovery (for some
filesystems, notably XFS is different due to journal not being endian
safe). We also have to have an option *not* to do that, just in case
there's a bug in journal replay... and we're lucky in the fact that we
probably do have enough memory to complete replay, this solution could
be completely impossible on lower memory machines.

As such, I believe we're the only bit of firmware/bootloader ever that
has correctly parsed a journalling filesystem.

-- 
Stewart Smith


signature.asc
Description: PGP signature


Re: Loss of connection to Half of the drives

2015-12-23 Thread Donald Pearson
On Wed, Dec 23, 2015 at 12:20 PM, Goffredo Baroncelli
 wrote:
> On 2015-12-23 16:53, Donald Pearson wrote:
> [...]
>>
>> Additionally real Raid10 will run circles around what BTRFS is doing
>> in terms of performance.  In the 20 drive array you're striping across
>> 10 drives, in BTRFS right now you're striping across 2 no matter what.
>> So not only do I lose in terms of resilience I lose in terms of
>> performance.  I assume that N-way-mirroring used with BTRFS Raid10
>> will also increase the stripe width so that will level out the
>> performance but you're always going to be short a drive for equal
>> resilience.
>
> In case of RAID10,on the best of my knowledge, BTRFS allocate each CHUNK 
> across *all* the available devices. It uses the usual RAID0 (==striping) over 
> a RAID1 (mirroring).
>
> What you are describing is the BTRFS RAID1; i.e. LINEAR over a RAID1:each 
> chunk is allocated in *two*, only *two* different disks from the disks pool; 
> the disks are the ones with the largest free space. Each chunk may be 
> allocated on a different *pair* of disks.
>

Okay so however the chunk is divided up, 2 copies of each chunk
division is written somewhere.  So I misunderstood, thanks for
clearing it up!

>> And finally the elephant in the room that comes with the necessary
>> 11-way mirroring is that the usable capacity of that 20 drive array.
>> Remember, pea brain so my math may be wrong in application and
>> calculation but if it's made of 1T drives for 20T raw, there is only
>> 1.82T usable (20 / 11) and if I'm completely off in that figure the
>> point is still that such a high level of mirroring is going to
>> excessively consume drive space.
>
> Ducan talked about a N-way mirroring, where each disks contains a copy of the 
> same data. Nobody talked about N-way mirroring where N is less than the 
> number of the available disks.
>

Well that was certainly implied as the unimplemented solution to
dropping half the drives that the OP tested.  N-way mirroring where N
= the number of drives is just Raid1 on crack and not the Raid10
use-case that the OP is asking about.

> To be honest in the past appeared some patches to implement a generalized 
> RAID-NxM raid, where N are the total disk, M are the redundancy disks: i.e. 
> the filesystem could allow a drop of M disks (see 
> http://www.spinics.net/lists/linux-btrfs/msg29245.html).
>
> BR
> G.Baroncelli
>
>
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli 
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


Yeah that whole thing is pretty upsetting.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas to do custom operation just after mount?

2015-12-23 Thread Dave Chinner
On Mon, Dec 21, 2015 at 01:18:22PM +0800, Anand Jain wrote:
> 
> 
> >BTW, any good idea for btrfs to do such operation like
> >enabling/disabling some minor features? Especially when it can be set on
> >individual file/dirs.
> >
> >Features like incoming write time deduplication, is designed to be
> >enabled/disabled for individual file/dirs, so it's not a quite good idea
> >to use mount option to do it.
> >
> >Although some feature, like btrfs quota(qgroup), should be implemented
> >by mount option though.
> >I don't understand why qgroup is enabled/disabled by ioctl. :(
> 
> 
> mount option won't persist across systems/computers unless
> remembered by human.

So record the mount option you want persistent in the filesystem at
mount time and don't turn it off until a "no-" mount option is
provided at mount time.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Chris Murphy
On Wed, Dec 23, 2015 at 1:24 PM, Neuer User  wrote:
> One other thing:
>
> I read that btrfs has some options that are turned off for SSDs as they
> might be harmful or so. In my case btrfs, however, would not know about
> the SSD and probably use its HDD optimized settings. The result,
> however, would be forwared also to the SSD via lvmcache. Do I see that
> right? Would that give any serious problems?

No, Btrfs is fine for SSD with or without the optimization. And with
optimization is OK for hard drives also. I think you're unlikely to
notice any difference, but you can test it if you want with mount
options ssd or nossd, depending on how the cache LV is detected (I'd
guess it's detected as non rotational so ssd option is default).


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid 5/6 Stability

2015-12-23 Thread jwalmer
Hello dev crew,

Just an avid follower of the project checking in. It has been about nine months 
since the initial Raid 5/6 features were released in 3.19 and they are still 
listed as incomplete/experimental on the Wiki.

Admittedly, I don't understand how such a large and distributed project 
prioritizes features for development, but I haven't been able to find a clear 
roadmap anywhere. 

I'm wondering if anyone here is able to give me some insight about when the 
Raid 5/6 feature will next be updated, or even when they are scheduled to lose 
their incomplete/experimental designation.

Thanks!--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Noah Massey
On Wed, Dec 23, 2015 at 6:38 AM, Neuer User  wrote:
> Am 23.12.2015 um 12:21 schrieb Martin Steigerwald:
>> Hi.
>>
>> As far as I understand this way you basically loose the RAID 1 semantics of
>> BTRFS. While the data is redundant on the HDDs, it is not redundant on the
>> SSD. It may work for a pure read cache, but for write-through you definately
>> loose any data integrity protection a RAID 1 gives you.
>>
> Hmm, are you sure? I thought LVM lies underneath btrfs. Btrfs thus
> should not know about the caching SSD at all. It only knows of the two
> LVs on the HDDs, reading and writing data from or to one or both of the
> two LVs.

I believe Martin's concern is two-fold:

The first, major issue, concerns the default writeback cache mode,
which makes the SSD a single point of failure.
(in writeback mode, a write to a block that is cached will go only to
the cache and the block
will be marked dirty in the metadata.) If the SSD fails with dirty
data in the cache which has not been flushed to the backing devices,
the filesystem may be in a unrecoverable state, because writes which
BTRFS was told had succeeded are not present on disk.

The second potential issue is that if the SSD performs internal
deduplication, the two copies of cached data (contents on drive 1,
content on drive 2) may actually be a reference to the same bits of
internal storage, meaning a single corruption will affect both cached
copies. If in writeback, then corrupted data could flush down to both
disks. I'm not sure what would happen in writethrough.

~ Noah
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Neuer User
Am 23.12.2015 um 20:45 schrieb Noah Massey:
> On Wed, Dec 23, 2015 at 6:38 AM, Neuer User  wrote:
> I believe Martin's concern is two-fold:
> 
> The first, major issue, concerns the default writeback cache mode,
> which makes the SSD a single point of failure.
> (in writeback mode, a write to a block that is cached will go only to
> the cache and the block
> will be marked dirty in the metadata.) If the SSD fails with dirty
> data in the cache which has not been flushed to the backing devices,
> the filesystem may be in a unrecoverable state, because writes which
> BTRFS was told had succeeded are not present on disk.
Ok, I see. Would it help, if the cache were set to writethrough then? In
this case the data on the hdds should be always ok, right? (At least as
long as the hdds are fine.)
> 
> The second potential issue is that if the SSD performs internal
> deduplication, the two copies of cached data (contents on drive 1,
> content on drive 2) may actually be a reference to the same bits of
> internal storage, meaning a single corruption will affect both cached
> copies. If in writeback, then corrupted data could flush down to both
> disks. I'm not sure what would happen in writethrough.
> 
Understood. However, do SSDs really do automatic deduplication? I might
be completely wrong here, but that sounds to be a rather complex
mechanism, requiring lots of RAM to deduplicate 100 GB. I wouldn't have
thought that typical SSDs include that?

> ~ Noah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs und lvm-cache?

2015-12-23 Thread Neuer User
Am 23.12.2015 um 20:49 schrieb Chris Murphy:
> Seems to me if the LV's on the two HDDs are exposed, the lvmcache has
> to separately keep track of those LVs. So as long as everything is
> working correctly, it should be fine. That includes either transient
> or persistent, but consistent, errors for either HDD or the SSD, and
> Btrfs can fix up those bad reads with data from the other. If the SSD
> were to decide to go nutty, chances are reads through lvmcache would
> be corrupt no matter what LV is being read by Btrfs, and it'll be
> aware of that and discard those reads. Any corrupt writes in this
> case, won't be immediately known by Btrfs because it (like any file
> system) assumes writes are OK unless the device reports a write
> failure, but those too would be found on read.

What corrupt write do you mean? The "nuts" SSD is not going to write to
the HDDs, that will be done by lvmcache. So the HDDs should get the
correct data, only the SSD will be bad, right?

And that would become obvious with the next reads, in which case btrfs
probably would throw an error as it gets crazy data from apparently both
LVs (but only coming from the SSD). So, that could be fixed by removing
the SSD without any data loss from the HDDs, right?

> 
> The question I have, that I don't know the answer to, is if the stack
> arrives at a point where all writes are corrupt but hardware isn't
> reporting write errors, and it continues to happen for a while, once
> you've resolved that problem and try to mount the file system again,
> how well does Btrfs disregard all those bad writes? How well would any
> filesystem?
> 
Hmm, again the writes to the HDDs should be ok. Only the SSD would have
pretty corrupt data, right? In such a case it might depend on how much
bad data is read back from the SSDs and what the filesystem does in
raction to these?

P.S.: Of course, one other possibility would be to use a second SSD, so
that each LV has a separate caching SSD. In this case, there would
always be a valid source (given that not both SSDs go nuts the same
time...).

But I would need another slot for this. If the pros are very high,
that's ok. If it works nicely with just one SSD, then even better.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs und lvm-cache?

2015-12-23 Thread Neuer User
Hello

I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.

The server will be used to host a (small) number of virtual machines,
most of them being LXC containers, few being KVM machines. One of the
LXC containers will host a fileserver with app 1 TB of data and another
one a backup system for the desktops / laptops in my household, thus
probably holding quite a lot of files. The lxc containers will use the
filesystem of the proxmox host, the KVM machines probably raw disk files
(or qcow2).

I would like to combine high data integrity with some speed, so I
thought of the following layout:

- both hdd and ssd in one LVM VG
- one LV on each hdd, containing a btrfs filesystem
- both btrfs LV configured as RAID1
- the single SDD used as a LVM cache device for both HDD LVs to speed up
random access, where possible

Now, I wonder if that is a good architecture to go for. Any input on
that? Is btrfs the right way to go for, or should I better go for ZFS
(and purchase some more gigs of RAM)?

Will there be any problems arising from the lvmcache? btrfs only sees
the HDDs, LVM does the SDD handling.

Thanks for any input. I like btrfs very much, but data integrity is
important for this.

Michael


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html