subject:"Fedora 33 System\-Wide Change proposal\: Make btrfs the default file system for desktop variants"

On Fri, Jul 10, 2020 at 1:45 PM Tomasz Torcz  wrote:
>
> On Fri, Jul 10, 2020 at 07:14:09PM +0200, Vitaly Zaitsev via devel wrote:
> > On 26.06.2020 16:42, Ben Cotton wrote:
> > > ** transparent compression: significantly reduces write amplification,
> > > improves lifespan of storage hardware
> >
> > What can you say about this? https://arxiv.org/pdf/1707.08514.pdf
>
>   Also funny note: when compression was introduced in ZFS, circa 2007,
> it was mainly promoted as _performance_ win, not a space saving measure.
> This was still 5 years before NVMe, so all we had was SATA, SAS and FC
> drives, yet the CPUs were already multi-core and multi-gigahertz.
> Transfering uncompressed data was _slower_ than compressing/decompressing
> and having to transfer less data.  For a bit higher CPU usage we got
> noticeable bandwidth wins.
>   The tradeoff is no longer there, as single drives reach 7GiB/s
> transfer speed.

It would need to be benchmarked. The CPU in these cases has also
improved dramatically, perhaps more significantly than storage
performance. In which case, the compression may still not be a
limiting factor. lzbench is useful for this. Compiling it on Fedora is
straight forward but needs this hint or some improved understanding of
the problem

https://github.com/inikep/lzbench/issues/69

Note, you should use -b 128K since the Btrfs compress block size is
128KiB. There are a variety of corpuses available, I use silesia.tar

http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia

But you can also just tar /usr or /home.

There is error introduced with this benchmark. Btrfs compression is
per file. Any files less than 128K tend to have lower compression
ratio, so there is an overestimate of compression by lzbench in this
regard; whereas there's btrfs inline extents possible and in that
regard the compression is underestimated (or more correctly the actual
cost of the write). Another error is single thread vs multiple thread
compression, and single queue vs multi queue block device. Another
error is lzbench has essentially no latency, it's just one file being
tested. Whereas real world usage there's many files being read and
written, each with latency, during which time compression can happen
for essentially no additional latency cost. But not always for no
cost. So it's actually really complicated and probably why no one
really wants to do this kind of detailed benchmarking analysis. We're
probably better off making a new benchmark based on ordinary things:
compiling the kernel, launching applications, doing updates, git
updating and git log searching, etc. But even that is just a guess.

That reminds me: a git based approach for aging a file system.
https://www.usenix.org/system/files/hotstorage19-paper-conway.pdf
https://github.com/saurabhkadekodi/geriatrix

I haven't messed around with that, but maybe someone wants to turn
that into a how to. I'll do the testing if no one wants to burn their
SSD with writes. I've got a Samsung 840 EVO on an old laptop that I'm
actively trying to kill off.

Something that isn't accountable without blind studies involving
users, is some latencies users are hyper sensitive to and other
latencies they aren't at all sensitive to. I haven't dug up any
research on this, but I imagine it has been. Apple did a bunch of UI
changes early in the Mac OS X development cycle and while overall
latencies were lower as a result of having an (almost) preemptive
multitasking OS instead of the former cooperative multitasking OS, the
GUI had so much "eye candy" special effects that users got pissed at
how slow the OS seemed.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On Sat, Jul 11, 2020 at 6:11 AM Artem Tim  wrote:
>
> BTRFS WA is ~8 times higher than ext4. Average profit from compression about 
> 50% max. Not that hard arithmetic.

The paper is with respect to metadata write amplification. This has no
effect on data writes. Compression applies to data writes, not
metadata. As the data amount is significantly larger than metadata
(the file system itself), any reduction in data writes overwhelms the
metadata writes.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On Fri, Jul 10, 2020 at 11:14 AM Vitaly Zaitsev via devel
 wrote:
>
> On 26.06.2020 16:42, Ben Cotton wrote:
> > ** transparent compression: significantly reduces write amplification,
> > improves lifespan of storage hardware
>
> What can you say about this? https://arxiv.org/pdf/1707.08514.pdf

The paper states its bias in the conclusion. It is a conjecture.
They're trying to demonstrate using the worst case possible scenario
testing of file systems in use (they do in fact behave this way) that
a new file system needs to be developed, and for the use case they
have in mind all of the evaluated general purpose file systems are
disqualified. If you aren't looking to disqualify all general purpose
file systems for your use case, this is not the paper for you.

Intentionally not explored, are various file system optimizations to
mitigate this problem and real world general purpose workloads. In the
case of Btrfs, those include delayed allocation, treelog, inline
extents, and the default 16KiB leaf size.

The paper discounts entirely the workloads where fsync() isn't used.
The paper admits this. "We should note that write amplification is
high in our workloads because we do small writes followed by a
fsync()." Many small file writes on a general purpose file system are
quite a lot less than this, and on Btrfs many of those writes will be
inline extents. i.e. they are stored inside the 16KiB leaf along with
their inode entry. In the case of many recurring writes, the actual
write pattern coalesces many file changes into the same leaf that's
going to be written anyway. Yes, there is a big hit for that first
write, but all the other writes are cheaper, maybe even free, if they
happen inside the commit window. It's also a good reason to not
fsync() the heck out of everything needlessly.

Finally, they are only looking at metadata writes. This is a tiny
amount of writes compared to the data payload. Any compression of data
will produce overwhelming reduction on net write amplification.

If we look at another paper with a different bias that's already been
cited in devel@ discussions, "Evaluating File System Reliability
on Solid State Drives" by Jaffer, et al - they say "Most notably Btrfs
[46], a copy-on-write file system which is more suitable for SSDs
with no in-place writes, has garnered wide adoption. The design of
Btrfs is particularly interesting as it has fewer total writes than
ext4’s journaling mechanism." How do we square this statement with the
previous paper? They are looking at different workloads.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On Sat, Jul 11, 2020 at 5:55 AM Antti  wrote:

> For example btrfs has for a long-time had this issue where after several 
> months and being maybe more than 75% of disk space being in use, that when 
> run on SSDs, system can randomly stops reading from the file system, starts 
> thinking and then eventually returns. With each freezing the condition gets 
> worse and eventually the system is eternally stuck and power reset is 
> required.

This is not normal and not acceptable. It is unfortunately true that
there is a disproportionate burden placed on those having problems no
one else is having. And troubleshooting amounts to either poking it
with a stick (try this! no, try this! ok, now try this!) or providing
sufficiently detailed reproduction steps. And that's tedious too.

> The way this happens for example if you open Gnome Shell application launcher 
> several times in a row, then likelyhood that Gnome completely freezes for 
> duration of some seconds up to one minute increases. I don't see this 
> behaviour when using any other file system so I've attributed it to btrfs but 
> I have no way of knowing if it is an actual issue in btrfs other than it 
> stopped when disk gets formatted to anything else.

My suggestion for any such freeze/hang is to issue sysrq+t. This might
not be easy to do at exactly the time of the hang, because the hang
prevents it from being typed fast enough. (a) remote ssh session with
sysrq+t typed out and ready to just hit enter (b) netconsole, same
concept. Reproduce the problem and then hit enter. Then file a bug
with 'journalctl -k -o short-monotonic > bug#_journal.txt' - likely
the default dmesg buffer will be too small to hold everything but the
journal will have it. That should expose the nature of the hang.

If kernel messages show there's a blocked task for 2 minutes, in that
case it's better to use sysrq+w.

In this case it's not necessary to have extremely detailed
reproduction steps, nor wait for someone to have a properly aged
system to see what's going on.

> And also notice that I wrote "maybe 75% full" because there is no way to know 
> the actual free disk space from just "df -h". There are chapters about this 
> in btrfs FAQ pages that df lies about disk space when using btrfs since 
> evaluating free disk space in btrfs system is a tricky and challenging task 
> with no good solution in sight. This is why e.g. use of "btrfs fs usage /" is 
> required together with other tools to have some idea of available disk space.

In the single device case, 'df' is expected to tell the truth. In the
multiple device case, it should still tell the truth, but can be
confusing because it can't tell the whole truth. And for that, there
is 'btrfs filesystem usage /mnt' which provides quite a lot more
information, to the degree it can be confusing at first. But the
single device case is really straight forward, I just use 'df' and
'du' most of the time unless for some reason I want more information.

Recent example of multiple device confusion:
https://bugzilla.redhat.com/show_bug.cgi?id=1855174
https://lore.kernel.org/linux-btrfs/0326afd3-9e14-b682-30e7-1c8ae4481...@lechevalier.se/T/#t

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-11 Thread Dominique Martinet

Artem Tim wrote on Sat, Jul 11, 2020:
> BTRFS WA is ~8 times higher than ext4. Average profit from compression
> about 50% max. Not that hard arithmetic.

It's not that simple.
The pattern used in that paper is far from a standard workload (random
writes within a file with cow is just about as bad as things can get
wrt. write amplification) ; so things like the sqlite db firefox uses in
your home will be worse as far as that goes with btrfs even if
compressed yes certainly.

But if you're talking open w/ truncate (or new file), write in a single
stride, close and never write again (like what happens when you upgrade
packages, compile something, download something etc etc) then the
difference won't be that big.

As Chris said multiple times, it's hard to find the right way to measure
impacts, and I don't have good solutions either, but this definitely
isn't the kind of usage I make of my filesystem.
I'd be tempted to believe the feedback from facebook on that one, even
if adding snapshots into the mix it's not 100% clear if compression has
much impact by itself either...



BTW, given the size gains ws. time difference for compression I would
advocate for default zstd compression instead of :1 -- I'd think another
12% compression improvement[1] for almost no time difference isn't to be
sneezed at?

[1] https://www.spinics.net/lists/fedora-devel/msg274978.html
-- 
Dominique
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-11 Thread Artem Tim

BTRFS WA is ~8 times higher than ext4. Average profit from compression about 
50% max. Not that hard arithmetic.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-11 Thread Antti

> That said, as one of the change owners, I *want* to know about your
> issues.

Yes, I understand. It's just that I believe that the burden of proof is on my 
shoulders to prove that I have this and that issue before making bug reports. 
The problem I often face with btrfs is that it is highly inconsistent with its 
behaviour and that makes filling bug reports with concrete evidence of an issue 
difficult. i should just make videos of the issues or something. Most problems 
also only start after several months of serious usage and cannot easily be 
replicated in systems with other systems unless they're of exactly same model 
with exactly same kind of disks made. This isn't a problem if you constantly 
reformat your disks when hopping between distros but if you just want to 
continuously upgrade it will become an issue or at least does on my machines.

For example btrfs has for a long-time had this issue where after several months 
and being maybe more than 75% of disk space being in use, that when run on 
SSDs, system can randomly stops reading from the file system, starts thinking 
and then eventually returns. With each freezing the condition gets worse and 
eventually the system is eternally stuck and power reset is required.

The way this happens for example if you open Gnome Shell application launcher 
several times in a row, then likelyhood that Gnome completely freezes for 
duration of some seconds up to one minute increases. I don't see this behaviour 
when using any other file system so I've attributed it to btrfs but I have no 
way of knowing if it is an actual issue in btrfs other than it stopped when 
disk gets formatted to anything else.

And also notice that I wrote "maybe 75% full" because there is no way to know 
the actual free disk space from just "df -h". There are chapters about this in 
btrfs FAQ pages that df lies about disk space when using btrfs since evaluating 
free disk space in btrfs system is a tricky and challenging task with no good 
solution in sight. This is why e.g. use of "btrfs fs usage /" is required 
together with other tools to have some idea of available disk space.

> Well, huh, I've not heard of a recommendation about JFS in a long
> time. For heavy I/O database workloads, I suggest XFS, though Btrfs
> can be made to work quite well for database workloads with stuff like
> nodatacow as I mentioned earlier.

Yeah, that came out of an email which was written some years ago. I'm not 
planning on actively using anything else than ext4 or lvm+ext4 at the moment in 
my daily life. However as a result of the btrfs pushing I've started to look 
for alternatives if there was something what could improve my workflow. This 
includes testing out current state of JFS, BCacheFS, etc.

I wanted to address some of the things you guys wrote but after several days I 
found myself writing more and more about just one particular thing. That thing 
being partitioning setup phase in Anaconda. Especially relating to the user's 
ability to easily choose an alternative configuration and customise the 
partitioning during the setup phase. I'll try to condense the most important 
points here.


> It is actually quite easy to choose an alternative configuration if
> you want. When you go through Anaconda installation and go to storage,
> you can choose "Custom", and from there you have a drop-down list of
> partitioning schemes: plain, LVM, LVM-thin, and Btrfs. You can select
> any of those and have Anaconda do a default setup based on that. The
> current default is "LVM", and we're changing the default to
> "Btrfs".
> But it's straightforward to make this change yourself at install time.
> 
> In my experience, YaST is actually pretty hard to use to switch to
> alternative configurations, so I'm surprised you say that it's
> difficult in Anaconda but not in YaST.
> 


No offense but you're really out of touch when it comes to this issue. Unlike 
Fedora, openSUSE has one the best partitioning setup phases I know about. In 
Fedora it is not easy to choose an alternative configuration or clearly 
comprehend what it is going to do to users disks. It is because of the UI 
design gone bad. It's also due low usability of Anaconda. Fedora's partitioning 
is over-engineered and too clever for its own good and it is like an hack 
intended to patch previous bad design.

Ever since the times of around or even slightly before F21 things have gone bad 
with it. Thankfully blivet has been a real life-saver when it was introduced I 
think in F26 or F27 to Anaconda. Especially when you have two or more disks and 
just want to mount some partitions (e.g. /home) from one disk and reformat 
existing partitions (e.g. /root & /boot) on an another disk.

Issues with Fedora's partitioning include (but are not limited to): 1. too many 
confusing separate steps, 2. no clear overview of what is actively being done 
to disks, 3. weird UI element placements with confusing labels, 4. similarly 
named selections which

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread Tomasz Torcz

On Fri, Jul 10, 2020 at 07:14:09PM +0200, Vitaly Zaitsev via devel wrote:
> On 26.06.2020 16:42, Ben Cotton wrote:
> > ** transparent compression: significantly reduces write amplification,
> > improves lifespan of storage hardware
> 
> What can you say about this? https://arxiv.org/pdf/1707.08514.pdf

  Also funny note: when compression was introduced in ZFS, circa 2007,
it was mainly promoted as _performance_ win, not a space saving measure.
This was still 5 years before NVMe, so all we had was SATA, SAS and FC
drives, yet the CPUs were already multi-core and multi-gigahertz.
Transfering uncompressed data was _slower_ than compressing/decompressing
and having to transfer less data.  For a bit higher CPU usage we got
noticeable bandwidth wins.
  The tradeoff is no longer there, as single drives reach 7GiB/s
transfer speed.

-- 
Tomasz TorczOnly gods can safely risk perfection,
to...@pipebreaker.pl it's a dangerous thing for a man.  — Alia
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread Tom Seewald

> It doesn't use compression so not relevant to the cited statement?

Well the paper compares  ext2, ext4, xfs, f2fs, and btrfs in terms of IO 
amplification and states:
"In fact, in all our experiments, btrfs was an outlier, producing the highest 
read, write, and space amplification."
 
The results listed in Tables 1 and 2 show that btrfs does incur higher amounts 
of IO, so even with compression it's not at all obvious that this would bring 
btrfs down to levels comparable to (or lower than) the other file systems. 
Hence I believe Vitaly is linking this paper to suggest that evidence is needed 
before we can confidently assert that btrfs + compression is better at 
preserving nand than using ext4 or xfs.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread Gordon Messmer


On 7/10/20 10:14 AM, Vitaly Zaitsev via devel wrote:

On 26.06.2020 16:42, Ben Cotton wrote:

** transparent compression: significantly reduces write amplification,
improves lifespan of storage hardware

What can you say about this? https://arxiv.org/pdf/1707.08514.pdf



I would say that it illustrates the reason that compression is being 
proposed.  What did you take away from it?

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread drago01

On Friday, July 10, 2020, Vitaly Zaitsev via devel <
devel@lists.fedoraproject.org> wrote:

> On 26.06.2020 16:42, Ben Cotton wrote:
> > ** transparent compression: significantly reduces write amplification,
> > improves lifespan of storage hardware
>
> What can you say about this? https://arxiv.org/pdf/1707.08514.pdf


It doesn't use compression so not relevant to the cited statement?


> --
> Sincerely,
>   Vitaly Zaitsev (vit...@easycoding.org)
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: https://docs.fedoraproject.
> org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.
> fedoraproject.org
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread Vitaly Zaitsev via devel

On 26.06.2020 16:42, Ben Cotton wrote:
> ** transparent compression: significantly reduces write amplification,
> improves lifespan of storage hardware

What can you say about this? https://arxiv.org/pdf/1707.08514.pdf

-- 
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread Przemek Klosowski via devel


On 7/9/20 2:24 PM, Eric Sandeen wrote:

<50 runs later on btrfs>

16 readonly mounts failed (32% failure rate)
Within the successful mounts, 1 or more files were unreachable in 30 attempts.
Across all 50 attempts, 7720 files were lost.

Is that better than ext4, and will ext4 need fsck just to be able to mount?

<50 runs later on ext4, same strategy>

zero mount failures for ext4.
Within the successful mounts, 1 or more files were unreachable in 2 attempts.
Across all 50 attempts, 48 files were lost.


But for that test to be meaningful, you need to check that the files 
that ext4 recovers are actually what you expect---after all, if the 
metadata is damaged and repaired incorrectly, it could point to some 
random blocks and we'd never know. This is not just theoretical 
concern---I have seen this type of damage in fsck'ed systems, although I 
admit it has been long ago. The type of damage might be tricky---for 
instance part of the file would be correct, but other parts would be 
wrong, or the file would be truncated.


Btrfs will just give up if it screws up. You could see it as good or 
bad---after all, if a disk holding your pictures went bad, maybe it is 
useful to see partially damaged pictures, rather than having the 
filesystem throw up its hands. On the other hand, btrfs being harsh like 
that basically sends the message to 'backup or else', which may be the 
right thing in the end.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-10 Thread Alexander Ploumistos

On Fri, Jun 26, 2020 at 6:30 PM Josef Bacik  wrote:
>
> On 6/26/20 11:15 AM, Matthew Miller wrote:
> > On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
> >> Not Fedora land, but Facebook installs it on all of our root
> >> devices, so millions of machines.  We've done this for 5 years.
> >> It's worked out very well. Thanks,
> >
> > Josef, I'd love to hear your comments on any differences between that
> > situation and the typical laptop-user case for Fedora desktop systems.
> > Anything we should consider?
> >
>
> We buy worse hardware than a typical laptop user uses, at least for our hard
> drives.  Also we hit our disks harder than most typical Fedora users.  
> Consider
> the web tier for example, we push the entire website to every box in the web
> tier (measured in hundreds of thousands of machines) probably 6-10 times a 
> day.
> This is roughly 40 gib of data, getting written to these truly terrible 
> consumer
> grade flash drives (along with some spinning rust), 6-10 times a day.  In
> addition to the normal sort of logging, package updates, etc that happen.
>
> Also keep in mind we pay really close attention to burn rates for our drives,
> because obviously at our scale it translates to millions of dollars.  Btrfs 
> has
> improved our burn rates with the compression, as the write amplification goes
> drastically down, thus extending the life of the drives.

Hi Josef,

Out of curiosity, do you also  monitor SMART data for all your hard
drives? If yes, have you seen any correlations between specific errors
reported by btrfs and those picked up by SMART (not necessarily the
fatal ones)? Any useful conclusions?

Best regards,
A.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/9/20 9:15 PM, Josef Bacik wrote:
> On 7/9/20 9:30 PM, Eric Sandeen wrote:

...

>>> This test is run constantly by us, specifically because it's the error 
>>> cases that get you.  But not for crash consistency reasons, because we're 
>>> solid there.  I run them to make sure I don't have stupid things like 
>>> reference leaks or whatever in the error path.  Thanks,
>>
>> or "corrupted!" printk()s that terrify the hapless user? ;)
> 
> I'd love to know what hapless user is running xfstests.  Thanks,

*sigh*

the point is, telling the user "your filesystem is corrupted" if it's not 
actually corrupted is bad news.  Discovering that communication problem via 
xfstests does not make the concern less valid.  I was trying to gently tease 
you that the test not only discovers leaks, but also discovers terrifyingly 
inaccurate messages in response to IO errors, but I guess that didn't come 
through.

Thanks.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Josef Bacik

On 7/9/20 9:30 PM, Eric Sandeen wrote:

On 7/9/20 8:22 PM, Josef Bacik wrote:

On 7/9/20 7:23 PM, Eric Sandeen wrote:

On 7/9/20 4:27 PM, Eric Sandeen wrote:

On 7/9/20 3:32 PM, Davide Cavalca via devel wrote:

...

As someone on one of the teams at FB that has to deal with that, I can
assure you all the scenarios you listed can and do happen, and they
happen a lot. While we don't have the "laptop's out of battery" issue
on the production side, we have plenty of power events and unplanned
maintenances that can and will hit live machines and cut power off.
Force reboots (triggered by either humans or automation) are also not
at all uncommon. Rebuilding machines from scratch isn't free, even with
all the automation and stuff we have, so if power loss or reboot events
on machines using btrfs caused widespread corruption or other issues
I'm confident we'd have found that out pretty early on.

It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs
do not suffer filesystem corruptions and inconsistencies due to reboots
and power losses.

So for the record I am in no way insinuating that btrfs is less crash-safe
than other filesystems (though I have not tested that, so if I have time
I'll throw that into the mix as well.)

So, we already have those tests in xfstests, and I put btrfs through a few
loops. This is generic/475:

# Copyright (c) 2017 Oracle, Inc. All Rights Reserved.
#
# FS QA Test No. 475
#
# Test log recovery with repeated (simulated) disk failures. We kick
# off fsstress on the scratch fs, then switch out the underlying device
# with dm-error to see what happens when the disk goes down. Having
# taken down the fs in this manner, remount it and repeat. This test
# is a Good Enough (tm) simulation of our internal multipath failure
# testing efforts.

It fails within 2 loops. Is it a critical failure? I don't know; the
test looks for unexpected things in dmesg, and perhaps the filter is
wrong. But I see stack traces during the run, and message like:

[689284.484258] BTRFS: error (device dm-3) in btrfs_sync_log:3084: errno=-117
Filesystem corrupted

You might want to change that message, then. If it's not corrupted, I'd suggest not doing
printk("corrupted!") because that will make people think that it's corrupted, because it
says "Filesystem corrupted..." ;)

Yeah probably not the best, but again not something a user will generally see.

Yeah, because dm-error throws EIO, and thus we abort the transaction, which
results in an EUCLEAN if you run fsync. This is a scary sounding message, but
its _exactly_ what's expected from generic/475. I've been running this in a
loop for an hour and the thing hasn't failed yet. There's all sorts of scary
messages

That's weird. The test fails very quickly for me - again, AFAICT it fails due
to things in dmesg that aren't recognized as safe by the test harness, but a
variety of things - not just stack dumps - seem to trigger the failure.

Do you know what's tripping it? Because my loop is still running happily along.

[17929.939871] BTRFS warning (device dm-13): direct IO failed ino 261 rw
1,34817 sector 0xb8ce0 len 24576 err no 10
[17929.943099] BTRFS: error (device dm-13) in btrfs_commit_transaction:2323:
errno=-5 IO failure (Error while writing out transaction)

again, totally expected because we're forcing EIO's at random times.

Right, of course it will get IO errors, that's why I didn't highlight those in
my email.

so I can't say for sure.

Are btrfs devs using these tests to assess crash/powerloss resiliency
on a regular basis? TBH I honestly did not expect to see any test
failures here, whether or not they are test artifacts; any filesystem
using xfstests as a benchmark needs to be keeping things up to date.

It depends on the config options. Some of our transaction abort sites dump
stack, and that trips the dmesg filter, and thus it fails. Generally when I
run this test I turn those options off.

It would be good, in general, to fix up the test for btrfs so that it does not
yield false positives, if that's what this is. Otherwise it trains people to
ignore it or not run it

Except it doesn't, it's not failing for me now. Like I said we pay particularly
close attention to this test because it has a habit of finding memory leaks or
reference accounting bugs.

This test is run constantly by us, specifically because it's the error cases
that get you. But not for crash consistency reasons, because we're solid
there. I run them to make sure I don't have stupid things like reference leaks
or whatever in the error path. Thanks,

or "corrupted!" printk()s that terrify the hapless user? ;)

I'd love to know what hapless user is running xfstests. Thanks,

Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct:

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/9/20 8:22 PM, Josef Bacik wrote:
> On 7/9/20 7:23 PM, Eric Sandeen wrote:
>> On 7/9/20 4:27 PM, Eric Sandeen wrote:
>>> On 7/9/20 3:32 PM, Davide Cavalca via devel wrote:
>>
>> ...
>>
 As someone on one of the teams at FB that has to deal with that, I can
 assure you all the scenarios you listed can and do happen, and they
 happen a lot. While we don't have the "laptop's out of battery" issue
 on the production side, we have plenty of power events and unplanned
 maintenances that can and will hit live machines and cut power off.
 Force reboots (triggered by either humans or automation) are also not
 at all uncommon. Rebuilding machines from scratch isn't free, even with
 all the automation and stuff we have, so if power loss or reboot events
 on machines using btrfs caused widespread corruption or other issues
 I'm confident we'd have found that out pretty early on.
>>>
>>> It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs
>>> do not suffer filesystem corruptions and inconsistencies due to reboots
>>> and power losses.
>>>
>>> So for the record I am in no way insinuating that btrfs is less crash-safe
>>> than other filesystems (though I have not tested that, so if I have time
>>> I'll throw that into the mix as well.)
>>
>> So, we already have those tests in xfstests, and I put btrfs through a few
>> loops.  This is generic/475:
>>
>> # Copyright (c) 2017 Oracle, Inc.  All Rights Reserved.
>> #
>> # FS QA Test No. 475
>> #
>> # Test log recovery with repeated (simulated) disk failures.  We kick
>> # off fsstress on the scratch fs, then switch out the underlying device
>> # with dm-error to see what happens when the disk goes down.  Having
>> # taken down the fs in this manner, remount it and repeat.  This test
>> # is a Good Enough (tm) simulation of our internal multipath failure
>> # testing efforts.
>>
>> It fails within 2 loops.  Is it a critical failure? I don't know; the
>> test looks for unexpected things in dmesg, and perhaps the filter is
>> wrong.  But I see stack traces during the run, and message like:
>>
>> [689284.484258] BTRFS: error (device dm-3) in btrfs_sync_log:3084: 
>> errno=-117 Filesystem corrupted

You might want to change that message, then.  If it's not corrupted, I'd 
suggest not doing printk("corrupted!") because that will make people think that 
it's corrupted, because it says "Filesystem corrupted..." ;)

> 
> Yeah, because dm-error throws EIO, and thus we abort the transaction, which 
> results in an EUCLEAN if you run fsync.  This is a scary sounding message, 
> but its _exactly_ what's expected from generic/475.  I've been running this 
> in a loop for an hour and the thing hasn't failed yet.  There's all sorts of 
> scary messages

That's weird.  The test fails very quickly for me - again, AFAICT it fails due 
to things in dmesg that aren't recognized as safe by the test harness, but a 
variety of things - not just stack dumps - seem to trigger the failure.

> [17929.939871] BTRFS warning (device dm-13): direct IO failed ino 261 rw 
> 1,34817 sector 0xb8ce0 len 24576 err no 10
> [17929.943099] BTRFS: error (device dm-13) in btrfs_commit_transaction:2323: 
> errno=-5 IO failure (Error while writing out transaction)
> 
> again, totally expected because we're forcing EIO's at random times.

Right, of course it will get IO errors, that's why I didn't highlight those in 
my email.

>> so I can't say for sure.
>>
>> Are btrfs devs using these tests to assess crash/powerloss resiliency
>> on a regular basis?  TBH I honestly did not expect to see any test
>> failures here, whether or not they are test artifacts; any filesystem
>> using xfstests as a benchmark needs to be keeping things up to date.
>>
> 
> It depends on the config options.  Some of our transaction abort sites dump 
> stack, and that trips the dmesg filter, and thus it fails.  Generally when I 
> run this test I turn those options off.

It would be good, in general, to fix up the test for btrfs so that it does not 
yield false positives, if that's what this is.  Otherwise it trains people to 
ignore it or not run it

> This test is run constantly by us, specifically because it's the error cases 
> that get you.  But not for crash consistency reasons, because we're solid 
> there.  I run them to make sure I don't have stupid things like reference 
> leaks or whatever in the error path.  Thanks,

or "corrupted!" printk()s that terrify the hapless user? ;)

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Josef Bacik


On 7/9/20 7:23 PM, Eric Sandeen wrote:

On 7/9/20 4:27 PM, Eric Sandeen wrote:

On 7/9/20 3:32 PM, Davide Cavalca via devel wrote:


...


As someone on one of the teams at FB that has to deal with that, I can
assure you all the scenarios you listed can and do happen, and they
happen a lot. While we don't have the "laptop's out of battery" issue
on the production side, we have plenty of power events and unplanned
maintenances that can and will hit live machines and cut power off.
Force reboots (triggered by either humans or automation) are also not
at all uncommon. Rebuilding machines from scratch isn't free, even with
all the automation and stuff we have, so if power loss or reboot events
on machines using btrfs caused widespread corruption or other issues
I'm confident we'd have found that out pretty early on.


It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs
do not suffer filesystem corruptions and inconsistencies due to reboots
and power losses.

So for the record I am in no way insinuating that btrfs is less crash-safe
than other filesystems (though I have not tested that, so if I have time
I'll throw that into the mix as well.)


So, we already have those tests in xfstests, and I put btrfs through a few
loops.  This is generic/475:

# Copyright (c) 2017 Oracle, Inc.  All Rights Reserved.
#
# FS QA Test No. 475
#
# Test log recovery with repeated (simulated) disk failures.  We kick
# off fsstress on the scratch fs, then switch out the underlying device
# with dm-error to see what happens when the disk goes down.  Having
# taken down the fs in this manner, remount it and repeat.  This test
# is a Good Enough (tm) simulation of our internal multipath failure
# testing efforts.

It fails within 2 loops.  Is it a critical failure? I don't know; the
test looks for unexpected things in dmesg, and perhaps the filter is
wrong.  But I see stack traces during the run, and message like:

[689284.484258] BTRFS: error (device dm-3) in btrfs_sync_log:3084: errno=-117 
Filesystem corrupted



Yeah, because dm-error throws EIO, and thus we abort the transaction, which 
results in an EUCLEAN if you run fsync.  This is a scary sounding message, but 
its _exactly_ what's expected from generic/475.  I've been running this in a 
loop for an hour and the thing hasn't failed yet.  There's all sorts of scary 
messages


[17929.939871] BTRFS warning (device dm-13): direct IO failed ino 261 rw 1,34817 
sector 0xb8ce0 len 24576 err no 10
[17929.943099] BTRFS: error (device dm-13) in btrfs_commit_transaction:2323: 
errno=-5 IO failure (Error while writing out transaction)


again, totally expected because we're forcing EIO's at random times.


so I can't say for sure.

Are btrfs devs using these tests to assess crash/powerloss resiliency
on a regular basis?  TBH I honestly did not expect to see any test
failures here, whether or not they are test artifacts; any filesystem
using xfstests as a benchmark needs to be keeping things up to date.



It depends on the config options.  Some of our transaction abort sites dump 
stack, and that trips the dmesg filter, and thus it fails.  Generally when I run 
this test I turn those options off.


This test is run constantly by us, specifically because it's the error cases 
that get you.  But not for crash consistency reasons, because we're solid there. 
 I run them to make sure I don't have stupid things like reference leaks or 
whatever in the error path.  Thanks,


Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/9/20 4:27 PM, Eric Sandeen wrote:
> On 7/9/20 3:32 PM, Davide Cavalca via devel wrote:

...

>> As someone on one of the teams at FB that has to deal with that, I can
>> assure you all the scenarios you listed can and do happen, and they
>> happen a lot. While we don't have the "laptop's out of battery" issue
>> on the production side, we have plenty of power events and unplanned
>> maintenances that can and will hit live machines and cut power off.
>> Force reboots (triggered by either humans or automation) are also not
>> at all uncommon. Rebuilding machines from scratch isn't free, even with
>> all the automation and stuff we have, so if power loss or reboot events
>> on machines using btrfs caused widespread corruption or other issues
>> I'm confident we'd have found that out pretty early on.
> 
> It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs
> do not suffer filesystem corruptions and inconsistencies due to reboots
> and power losses.
> 
> So for the record I am in no way insinuating that btrfs is less crash-safe
> than other filesystems (though I have not tested that, so if I have time
> I'll throw that into the mix as well.)

So, we already have those tests in xfstests, and I put btrfs through a few
loops.  This is generic/475:

# Copyright (c) 2017 Oracle, Inc.  All Rights Reserved.
#
# FS QA Test No. 475
#
# Test log recovery with repeated (simulated) disk failures.  We kick
# off fsstress on the scratch fs, then switch out the underlying device
# with dm-error to see what happens when the disk goes down.  Having
# taken down the fs in this manner, remount it and repeat.  This test
# is a Good Enough (tm) simulation of our internal multipath failure
# testing efforts.

It fails within 2 loops.  Is it a critical failure? I don't know; the
test looks for unexpected things in dmesg, and perhaps the filter is
wrong.  But I see stack traces during the run, and message like:

[689284.484258] BTRFS: error (device dm-3) in btrfs_sync_log:3084: errno=-117 
Filesystem corrupted

so I can't say for sure.

Are btrfs devs using these tests to assess crash/powerloss resiliency
on a regular basis?  TBH I honestly did not expect to see any test
failures here, whether or not they are test artifacts; any filesystem
using xfstests as a benchmark needs to be keeping things up to date.

As a further test, I skipped the dmesg check, which may or may not be
finding false positives, and replaced it with a mount/umount/check cycle.
That seems to pass, so if fsck validation is complete and correct, perhaps
all is well in this regard.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Chris Murphy

On Thu, Jul 9, 2020 at 3:06 PM Stephen John Smoogen  wrote:
>
> That is because anyone who questions the perfection of ZFS is quickly
> burned at a stake.

I think Neal also has a good take on why, which is that it was mostly
a closed door development early on, wasn't used on heterogeneous
hardware out in the wild, upon release wasn't commonly available for
years - and just never really got the same kind of scrutiny and rumor
that Btrfs did.

>
> I don't know what it is about filesystems turning into religions that
> do not brook questioning but what I am seeing in these emails is what
> turns me off of btrfs every time it is brought up in the same way I
> couldn't stand reiser, ZFS, or various other filesystems..  I realize
> filesystems take a lot of faith as people have to put something they
> value into a leap of faith it will be there the next day.. but it
> seems to morph quickly into some sort of fanatical evangelical
> movement.

I've said this same thing in recent weeks. I don't understand it. I
don't know if you think I've done this. Certainly my experience over
10 years has been Btrfs developers have been among the least
defensive, and the first to say it doesn't meet every use case and of
course folks should use the file system that fits their requirements
the best.

> So a good reason why no one brings it up.. you learn quickly that
> questioning the perfection of any filesystem will fill your inbox with
> tirades from people.

Yeah that's kind of an obnoxious pig pen.

--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Martin Kolman



- Original Message -
> From: "Josef Bacik" 
> To: devel@lists.fedoraproject.org
> Sent: Thursday, July 9, 2020 9:11:07 PM
> Subject: Re: Fedora 33 System-Wide Change proposal: Make btrfs the default 
> file system for desktop variants
> 
> On 7/9/20 1:51 PM, Eric Sandeen wrote:
> > On 7/6/20 12:07 AM, Chris Murphy wrote:
> >> On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen 
> >> wrote:
> >>>
> >>> On 7/3/20 1:41 PM, Chris Murphy wrote:
> >>>> SSDs can fail in weird ways. Some spew garbage as they're
> >>>> failing, some go read-only. I've seen both. I don't have stats on
> >>>> how common it is for an SSD to go read-only as it fails, but once
> >>>> it happens you cannot fsck it. It won't accept writes. If it
> >>>> won't mount, your only chance to recover data is some kind of
> >>>> offline scrape tool. And Btrfs does have a very very good scrape
> >>>> tool, in terms of its success rate - UX is scary. But that can
> >>>> and will improve.
> >>>
> >>> Ok, you and Josef have both recommended the btrfs restore
> >>> ("scrape") tool as a next recovery step after fsck fails, and I
> >>> figured we should check that out, to see if that alleviates the
> >>> concerns about recoverability of user data in the face of
> >>> corruption.
> >>>
> >>> I also realized that mkfs of an image isn't representative of an
> >>> SSD system typical of Fedora laptops, so I added "-m single" to
> >>> mkfs, because this will be the mkfs.btrfs default on SSDs (right?).
> >>> Based on Josef's description of fsck's algorithm of throwing away
> >>> any block with a bad CRC this seemed worth testing.
> >>>
> >>> I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
> >>> image, or a bit less than 1% of the filesystem blocks, at random.
> >>> This is 1/4 the fuzzing rate from the original test.
> >>>
> >>> So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
> >>> mount, mount w/ recovery, and then restore ("scrape") if all that
> >>> fails, see what we get.
> >>
> >> What's the probability of this kind of corruption occurring in the
> >> real world? If the probability is so low it can't practically be
> >> computed, how do we assess the risk? And if we can't assess risk,
> >> what's the basis of concern?
> > 
> >  From 20 years of filesystem development experience, I know that people
> > run filesystem repair tools.  It's just a fact.  For a wide variety of
> > reasons - from bugs, to hardware errors, to admin errors, you name it,
> > filesystems experience corruption and inconsistencies.  At that point
> > the administrator needs a path forward.
> > 
> > "people won't need to repair btrfs" is, IMHO, the position that needs
> > to be supported, not "filesystem repair tools should be robust."
> > 
> >>> I ran 50 loops, and got:
> >>>
> >>> 46 btrfsck failures 20 mount failures
> >>>
> >>> So it ran btrfs restore 20 times; of those, 11 runs lost all or
> >>> substantially all of the files; 17 runs lost at least 1/3 of the
> >>> files.
> >>
> >> Josef states reliability of ext4, xfs, and Btrfs are in the same
> >> ballpark. He also reports one case in 10 years in which he failed to
> >> recover anything. How do you square that with 11 complete failures,
> >> trivially produced? Is there even a reason to suspect there's
> >> residual risk?
> > 
> > Extrapolating from Facebook's usecases to the fedora desktop should be
> > approached with caution, IMHO.
> > 
> > I've provided evidence that if/when damage happens for whatever reason,
> > btrfs is unable to recover in place far more often than other filesytems.
> > 
> >> When metadata is single profile, Btrfs is basically an early warning
> >> system.> The available research on uncorrectable errors, errors that drive
> >> ECC
> >> does not catch, suggests that users are decently likely to experience
> >> at least one block of corruption in the life of the drive. And that
> >> it tends to get worse up until drive failure. But there is much less
> >> chance to detect this, if the file system isn't also checksumming the
> >> vastly larger payload on a drive: the data.
> > 
> > One of the problems in this whole discus

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/9/20 3:38 PM, Chris Murphy wrote:
> On Thu, Jul 9, 2020 at 1:57 PM Eric Sandeen  wrote:
>> On 7/9/20 2:11 PM, Josef Bacik wrote:
  From what I've gathered from these responses, btrfs is unique in that it 
 is
 /expected/ that if anything goes wrong, the administrator should be 
 prepared
 to scrape out remaining data, re-mkfs, and start over.  If that's 
 acceptable
 for the Fedora desktop, that's fine, but I consider it a risk that should 
 not
 be ignored when evaluating this proposal.

>>> Agreed, it's the very first thing I said when I was asked what are the 
>>> downsides.  There's clearly more work to be done in the recovery arena.  
>>> How often do disks fail for Fedora?  Do we have that data?  Is this a real 
>>> risk? Nobody can say because Fedora doesn't have data.
>> But again, let me reiterate that disk failures are far from the only
>> reason that admins need capable filesystem repair tools, in general.
>>
>> We see users running fsck all the time, for various reasons.  I can't
>> back it up, but my hunch is that bugs and misconfigurations (i.e. write
>> cache) are more often the root cause for filesystem inconsistencies.
>>
>> IMHO, focusing on physical disk failure rates is focusing too narrowly,
>> but I suppose I'm just joining the chorus of hunches and anecdotes now.
> Actually there's quite a lot of evidence of this, even though there's
> no precise estimate - not least of which these populations are
> constantly dying and reemerging, and can be batch (firmware version)
> specific. This is only the most recent such story on linux-btrfs@ (and
> warning, this reads like an alien autopsy):
> 
> https://lore.kernel.org/linux-btrfs/20200708034407.ge10...@hungrycats.org/
> 
> fsck.btrfs is a no op, same as fsck.xfs. And recently the actual
> repair utility dissuades users from running it casually.


Honestly, that's not relevant. They are no-ops because they do not need to
be run at boot time after an unclean shutdown, because the filesystems are
explicitly designed to handle that.  This is clearly stated in the man page,
the script itself, and the commit log.  In fact fsck.btrfs was copied from
fsck.xfs.

(Honestly fsck.ext[34] could be a no-op too, but for $REASONS it chooses to do
journal replay in userspace instead, via fsck.)

They are no-ops for this reason, and /not/ because fsck isn't /ever/ expected
to be needed.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/9/20 3:32 PM, Davide Cavalca via devel wrote:
> On Thu, 2020-07-09 at 16:15 -0400, Simo Sorce wrote:
>> However I have had bad kernels, power outages, loss of battery power
>> (laptops on too long suspend) and other random reasons to force
>> reboot
>> a system. That has been the primary case of file system checks
>> through
>> my Fedora usage. And luckily so far I never had a loss of filesystem
>> or
>> data that way, fsck always ended up solving most of the issues, and
>> whenever I lost file they ended up being temporary files I did not
>> care
>> for.
>>
>> I do not think those failures are common in Facebook fleets, so I am
>> quite skeptical FB data and failure modes are representative of
>> Fedora
>> usage as a desktop/laptop OS and therefore of the behavior of btrfs
>> in
>> those cases.
> 
> As someone on one of the teams at FB that has to deal with that, I can
> assure you all the scenarios you listed can and do happen, and they
> happen a lot. While we don't have the "laptop's out of battery" issue
> on the production side, we have plenty of power events and unplanned
> maintenances that can and will hit live machines and cut power off.
> Force reboots (triggered by either humans or automation) are also not
> at all uncommon. Rebuilding machines from scratch isn't free, even with
> all the automation and stuff we have, so if power loss or reboot events
> on machines using btrfs caused widespread corruption or other issues
> I'm confident we'd have found that out pretty early on.

It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs
do not suffer filesystem corruptions and inconsistencies due to reboots
and power losses.

So for the record I am in no way insinuating that btrfs is less crash-safe
than other filesystems (though I have not tested that, so if I have time
I'll throw that into the mix as well.)

We do at times see corrupted filesystems when something has a writeback
cache w/o a battery backup, though, because then the hardware violates
its guarantees to the filesystem this is the sort of thing I'd put
in the "misconfiguration" bucket.  Which happens from time to time, and
from which it is nice to be able to recover w/o heroics.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Neal Gompa

On Thu, Jul 9, 2020 at 4:16 PM Simo Sorce  wrote:
>
> On Thu, 2020-07-09 at 12:56 -0700, Eric Sandeen wrote:
> > On 7/9/20 2:11 PM, Josef Bacik wrote:
> > > >  From what I've gathered from these responses, btrfs is unique in that 
> > > > it is
> > > > /expected/ that if anything goes wrong, the administrator should be 
> > > > prepared
> > > > to scrape out remaining data, re-mkfs, and start over.  If that's 
> > > > acceptable
> > > > for the Fedora desktop, that's fine, but I consider it a risk that 
> > > > should not
> > > > be ignored when evaluating this proposal.
> > > >
> > >
> > > Agreed, it's the very first thing I said when I was asked what are the 
> > > downsides.  There's clearly more work to be done in the recovery arena.  
> > > How often do disks fail for Fedora?  Do we have that data?  Is this a 
> > > real risk? Nobody can say because Fedora doesn't have data.
> >
> > But again, let me reiterate that disk failures are far from the only
> > reason that admins need capable filesystem repair tools, in general.
> >
> > We see users running fsck all the time, for various reasons.  I can't
> > back it up, but my hunch is that bugs and misconfigurations (i.e. write
> > cache) are more often the root cause for filesystem inconsistencies.
> >
> > IMHO, focusing on physical disk failure rates is focusing too narrowly,
> > but I suppose I'm just joining the chorus of hunches and anecdotes now.
>
> Anecdata,
> but I use raid-1 on all my disks (since a catastrophic failure 20 years
> ago) and that shielded me from all disk failures since then (although I
> may have had silent corruption during the years I never lost any really
> important data that way, some picture may have got lost that way
> probably but it has been inconsequential for me).
>
> However I have had bad kernels, power outages, loss of battery power
> (laptops on too long suspend) and other random reasons to force reboot
> a system. That has been the primary case of file system checks through
> my Fedora usage. And luckily so far I never had a loss of filesystem or
> data that way, fsck always ended up solving most of the issues, and
> whenever I lost file they ended up being temporary files I did not care
> for.
>
> I do not think those failures are common in Facebook fleets, so I am
> quite skeptical FB data and failure modes are representative of Fedora
> usage as a desktop/laptop OS and therefore of the behavior of btrfs in
> those cases.
>
> Note, not saying btrfs should be avoided or anything, just that we need
> more data about those failure modes and how they affect btrfs before a
> change of defaults.
>

Maybe it's not the most helpful anecdotal data, but one of my
computers has been suffering through random CPU lockups to the point
where everything freezes and I need to reboot. I'm pretty sure there's
a fault in RAM or CPU (but with everything soldered on computers these
days...), but it's been nice to see that with these issues happening
to me fairly frequently (as in now basically weekly) forcing me to
power cycle, Btrfs has withstood that perfectly. No data loss, no
corruption, no inconsistencies. Everything just works. :)

The last time I had something like this with an ext4 system, it got
torched within a month, forcing me to spend money I couldn't really
afford to spend to replace the machine. Btrfs is letting me use a bad
computer longer. :)



-- 
真実はいつも一つ！/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Simo Sorce

On Thu, 2020-07-09 at 13:32 -0700, Davide Cavalca via devel wrote:
> On Thu, 2020-07-09 at 16:15 -0400, Simo Sorce wrote:
> > However I have had bad kernels, power outages, loss of battery power
> > (laptops on too long suspend) and other random reasons to force
> > reboot
> > a system. That has been the primary case of file system checks
> > through
> > my Fedora usage. And luckily so far I never had a loss of filesystem
> > or
> > data that way, fsck always ended up solving most of the issues, and
> > whenever I lost file they ended up being temporary files I did not
> > care
> > for.
> > 
> > I do not think those failures are common in Facebook fleets, so I am
> > quite skeptical FB data and failure modes are representative of
> > Fedora
> > usage as a desktop/laptop OS and therefore of the behavior of btrfs
> > in
> > those cases.
> 
> As someone on one of the teams at FB that has to deal with that, I can
> assure you all the scenarios you listed can and do happen, and they
> happen a lot. While we don't have the "laptop's out of battery" issue
> on the production side, we have plenty of power events and unplanned
> maintenances that can and will hit live machines and cut power off.
> Force reboots (triggered by either humans or automation) are also not
> at all uncommon. Rebuilding machines from scratch isn't free, even with
> all the automation and stuff we have, so if power loss or reboot events
> on machines using btrfs caused widespread corruption or other issues
> I'm confident we'd have found that out pretty early on.

Oh this is really good to know, it is more reassuring!

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc



___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Stephen John Smoogen

On Thu, 9 Jul 2020 at 16:49, Chris Murphy  wrote:
>
> On Thu, Jul 9, 2020 at 1:57 PM Eric Sandeen  wrote:
> >
> > On 7/9/20 2:11 PM, Josef Bacik wrote:
> > >>
> > >>  From what I've gathered from these responses, btrfs is unique in that 
> > >> it is
> > >> /expected/ that if anything goes wrong, the administrator should be 
> > >> prepared
> > >> to scrape out remaining data, re-mkfs, and start over.  If that's 
> > >> acceptable
> > >> for the Fedora desktop, that's fine, but I consider it a risk that 
> > >> should not
> > >> be ignored when evaluating this proposal.
> > >>
> > >
> > > Agreed, it's the very first thing I said when I was asked what are the 
> > > downsides.  There's clearly more work to be done in the recovery arena.  
> > > How often do disks fail for Fedora?  Do we have that data?  Is this a 
> > > real risk? Nobody can say because Fedora doesn't have data.
> >
> > But again, let me reiterate that disk failures are far from the only
> > reason that admins need capable filesystem repair tools, in general.
> >
> > We see users running fsck all the time, for various reasons.  I can't
> > back it up, but my hunch is that bugs and misconfigurations (i.e. write
> > cache) are more often the root cause for filesystem inconsistencies.
> >
> > IMHO, focusing on physical disk failure rates is focusing too narrowly,
> > but I suppose I'm just joining the chorus of hunches and anecdotes now.
>
> Actually there's quite a lot of evidence of this, even though there's
> no precise estimate - not least of which these populations are
> constantly dying and reemerging, and can be batch (firmware version)
> specific. This is only the most recent such story on linux-btrfs@ (and
> warning, this reads like an alien autopsy):
>
> https://lore.kernel.org/linux-btrfs/20200708034407.ge10...@hungrycats.org/
>
> fsck.btrfs is a no op, same as fsck.xfs. And recently the actual
> repair utility dissuades users from running it casually.
>
> COW file systems are different. ZFS has no fsck to speak of, it can be
> harrassed badly by hardware/firmware bugs too, and yet there aren't
> many people who consider ZFS a problemed file system. How would the
> story of Btrfs be different either without dm-log-writes to this day,
> or had it already arrived in 2010?
>

That is because anyone who questions the perfection of ZFS is quickly
burned at a stake.

I don't know what it is about filesystems turning into religions that
do not brook questioning but what I am seeing in these emails is what
turns me off of btrfs every time it is brought up in the same way I
couldn't stand reiser, ZFS, or various other filesystems..  I realize
filesystems take a lot of faith as people have to put something they
value into a leap of faith it will be there the next day.. but it
seems to morph quickly into some sort of fanatical evangelical
movement.

So a good reason why no one brings it up.. you learn quickly that
questioning the perfection of any filesystem will fill your inbox with
tirades from people.




-- 
Stephen J Smoogen.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Chris Murphy

On Thu, Jul 9, 2020 at 1:57 PM Eric Sandeen  wrote:
>
> On 7/9/20 2:11 PM, Josef Bacik wrote:
> >>
> >>  From what I've gathered from these responses, btrfs is unique in that it 
> >> is
> >> /expected/ that if anything goes wrong, the administrator should be 
> >> prepared
> >> to scrape out remaining data, re-mkfs, and start over.  If that's 
> >> acceptable
> >> for the Fedora desktop, that's fine, but I consider it a risk that should 
> >> not
> >> be ignored when evaluating this proposal.
> >>
> >
> > Agreed, it's the very first thing I said when I was asked what are the 
> > downsides.  There's clearly more work to be done in the recovery arena.  
> > How often do disks fail for Fedora?  Do we have that data?  Is this a real 
> > risk? Nobody can say because Fedora doesn't have data.
>
> But again, let me reiterate that disk failures are far from the only
> reason that admins need capable filesystem repair tools, in general.
>
> We see users running fsck all the time, for various reasons.  I can't
> back it up, but my hunch is that bugs and misconfigurations (i.e. write
> cache) are more often the root cause for filesystem inconsistencies.
>
> IMHO, focusing on physical disk failure rates is focusing too narrowly,
> but I suppose I'm just joining the chorus of hunches and anecdotes now.

Actually there's quite a lot of evidence of this, even though there's
no precise estimate - not least of which these populations are
constantly dying and reemerging, and can be batch (firmware version)
specific. This is only the most recent such story on linux-btrfs@ (and
warning, this reads like an alien autopsy):

https://lore.kernel.org/linux-btrfs/20200708034407.ge10...@hungrycats.org/

fsck.btrfs is a no op, same as fsck.xfs. And recently the actual
repair utility dissuades users from running it casually.

COW file systems are different. ZFS has no fsck to speak of, it can be
harrassed badly by hardware/firmware bugs too, and yet there aren't
many people who consider ZFS a problemed file system. How would the
story of Btrfs be different either without dm-log-writes to this day,
or had it already arrived in 2010?


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Davide Cavalca via devel

On Thu, 2020-07-09 at 16:15 -0400, Simo Sorce wrote:
> However I have had bad kernels, power outages, loss of battery power
> (laptops on too long suspend) and other random reasons to force
> reboot
> a system. That has been the primary case of file system checks
> through
> my Fedora usage. And luckily so far I never had a loss of filesystem
> or
> data that way, fsck always ended up solving most of the issues, and
> whenever I lost file they ended up being temporary files I did not
> care
> for.
> 
> I do not think those failures are common in Facebook fleets, so I am
> quite skeptical FB data and failure modes are representative of
> Fedora
> usage as a desktop/laptop OS and therefore of the behavior of btrfs
> in
> those cases.

As someone on one of the teams at FB that has to deal with that, I can
assure you all the scenarios you listed can and do happen, and they
happen a lot. While we don't have the "laptop's out of battery" issue
on the production side, we have plenty of power events and unplanned
maintenances that can and will hit live machines and cut power off.
Force reboots (triggered by either humans or automation) are also not
at all uncommon. Rebuilding machines from scratch isn't free, even with
all the automation and stuff we have, so if power loss or reboot events
on machines using btrfs caused widespread corruption or other issues
I'm confident we'd have found that out pretty early on.

Cheers
Davide
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Simo Sorce

On Thu, 2020-07-09 at 12:56 -0700, Eric Sandeen wrote:
> On 7/9/20 2:11 PM, Josef Bacik wrote:
> > >  From what I've gathered from these responses, btrfs is unique in that it 
> > > is
> > > /expected/ that if anything goes wrong, the administrator should be 
> > > prepared
> > > to scrape out remaining data, re-mkfs, and start over.  If that's 
> > > acceptable
> > > for the Fedora desktop, that's fine, but I consider it a risk that should 
> > > not
> > > be ignored when evaluating this proposal.
> > > 
> > 
> > Agreed, it's the very first thing I said when I was asked what are the 
> > downsides.  There's clearly more work to be done in the recovery arena.  
> > How often do disks fail for Fedora?  Do we have that data?  Is this a real 
> > risk? Nobody can say because Fedora doesn't have data.
> 
> But again, let me reiterate that disk failures are far from the only
> reason that admins need capable filesystem repair tools, in general.
> 
> We see users running fsck all the time, for various reasons.  I can't
> back it up, but my hunch is that bugs and misconfigurations (i.e. write
> cache) are more often the root cause for filesystem inconsistencies.
> 
> IMHO, focusing on physical disk failure rates is focusing too narrowly,
> but I suppose I'm just joining the chorus of hunches and anecdotes now.

Anecdata,
but I use raid-1 on all my disks (since a catastrophic failure 20 years
ago) and that shielded me from all disk failures since then (although I
may have had silent corruption during the years I never lost any really
important data that way, some picture may have got lost that way
probably but it has been inconsequential for me).

However I have had bad kernels, power outages, loss of battery power
(laptops on too long suspend) and other random reasons to force reboot
a system. That has been the primary case of file system checks through
my Fedora usage. And luckily so far I never had a loss of filesystem or
data that way, fsck always ended up solving most of the issues, and
whenever I lost file they ended up being temporary files I did not care
for.

I do not think those failures are common in Facebook fleets, so I am
quite skeptical FB data and failure modes are representative of Fedora
usage as a desktop/laptop OS and therefore of the behavior of btrfs in
those cases.

Note, not saying btrfs should be avoided or anything, just that we need
more data about those failure modes and how they affect btrfs before a
change of defaults.

My 2c,
Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/9/20 2:11 PM, Josef Bacik wrote:
>>
>>  From what I've gathered from these responses, btrfs is unique in that it is
>> /expected/ that if anything goes wrong, the administrator should be prepared
>> to scrape out remaining data, re-mkfs, and start over.  If that's acceptable
>> for the Fedora desktop, that's fine, but I consider it a risk that should not
>> be ignored when evaluating this proposal.
>>
> 
> Agreed, it's the very first thing I said when I was asked what are the 
> downsides.  There's clearly more work to be done in the recovery arena.  How 
> often do disks fail for Fedora?  Do we have that data?  Is this a real risk? 
> Nobody can say because Fedora doesn't have data.

But again, let me reiterate that disk failures are far from the only
reason that admins need capable filesystem repair tools, in general.

We see users running fsck all the time, for various reasons.  I can't
back it up, but my hunch is that bugs and misconfigurations (i.e. write
cache) are more often the root cause for filesystem inconsistencies.

IMHO, focusing on physical disk failure rates is focusing too narrowly,
but I suppose I'm just joining the chorus of hunches and anecdotes now.

-Eric

> Facebook does however have that data, and it's a microscopically small 
> percentage.  I agree that Facebook is vastly different from Fedora from a 
> recovery standpoint, but our workloads and hardware I think extrapolate to 
> the normal Fedora user quite well.  We drive the disks harder than the normal 
> Fedora user does of course, but in the end we're updating packages, taking 
> snapshots, and building code.  We're just doing it at 1000x what a normal 
> Fedora user does.  Thanks,
> 
> Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Josef Bacik


On 7/9/20 1:51 PM, Eric Sandeen wrote:

On 7/6/20 12:07 AM, Chris Murphy wrote:

On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen 
wrote:


On 7/3/20 1:41 PM, Chris Murphy wrote:

SSDs can fail in weird ways. Some spew garbage as they're
failing, some go read-only. I've seen both. I don't have stats on
how common it is for an SSD to go read-only as it fails, but once
it happens you cannot fsck it. It won't accept writes. If it
won't mount, your only chance to recover data is some kind of
offline scrape tool. And Btrfs does have a very very good scrape
tool, in terms of its success rate - UX is scary. But that can
and will improve.


Ok, you and Josef have both recommended the btrfs restore
("scrape") tool as a next recovery step after fsck fails, and I
figured we should check that out, to see if that alleviates the
concerns about recoverability of user data in the face of
corruption.

I also realized that mkfs of an image isn't representative of an
SSD system typical of Fedora laptops, so I added "-m single" to
mkfs, because this will be the mkfs.btrfs default on SSDs (right?).
Based on Josef's description of fsck's algorithm of throwing away
any block with a bad CRC this seemed worth testing.

I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
image, or a bit less than 1% of the filesystem blocks, at random.
This is 1/4 the fuzzing rate from the original test.

So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
mount, mount w/ recovery, and then restore ("scrape") if all that
fails, see what we get.


What's the probability of this kind of corruption occurring in the
real world? If the probability is so low it can't practically be
computed, how do we assess the risk? And if we can't assess risk,
what's the basis of concern?


 From 20 years of filesystem development experience, I know that people
run filesystem repair tools.  It's just a fact.  For a wide variety of
reasons - from bugs, to hardware errors, to admin errors, you name it,
filesystems experience corruption and inconsistencies.  At that point
the administrator needs a path forward.

"people won't need to repair btrfs" is, IMHO, the position that needs
to be supported, not "filesystem repair tools should be robust."


I ran 50 loops, and got:

46 btrfsck failures 20 mount failures

So it ran btrfs restore 20 times; of those, 11 runs lost all or
substantially all of the files; 17 runs lost at least 1/3 of the
files.


Josef states reliability of ext4, xfs, and Btrfs are in the same
ballpark. He also reports one case in 10 years in which he failed to
recover anything. How do you square that with 11 complete failures,
trivially produced? Is there even a reason to suspect there's
residual risk?


Extrapolating from Facebook's usecases to the fedora desktop should be
approached with caution, IMHO.

I've provided evidence that if/when damage happens for whatever reason,
btrfs is unable to recover in place far more often than other filesytems.


When metadata is single profile, Btrfs is basically an early warning
system.> The available research on uncorrectable errors, errors that drive ECC
does not catch, suggests that users are decently likely to experience
at least one block of corruption in the life of the drive. And that
it tends to get worse up until drive failure. But there is much less
chance to detect this, if the file system isn't also checksumming the
vastly larger payload on a drive: the data.


One of the problems in this whole discussion is the assumption that filesystem
inconsistencies only arise from disk bitflips etc; that's just not the case.

Look, I'm just providing evidence of what I've found when re-evaluating the
btrfs administration/repair tools.  I've found them to be quite weak.

 From what I've gathered from these responses, btrfs is unique in that it is
/expected/ that if anything goes wrong, the administrator should be prepared
to scrape out remaining data, re-mkfs, and start over.  If that's acceptable
for the Fedora desktop, that's fine, but I consider it a risk that should not
be ignored when evaluating this proposal.



Agreed, it's the very first thing I said when I was asked what are the 
downsides.  There's clearly more work to be done in the recovery arena.  How 
often do disks fail for Fedora?  Do we have that data?  Is this a real risk? 
Nobody can say because Fedora doesn't have data.


Facebook does however have that data, and it's a microscopically small 
percentage.  I agree that Facebook is vastly different from Fedora from a 
recovery standpoint, but our workloads and hardware I think extrapolate to the 
normal Fedora user quite well.  We drive the disks harder than the normal Fedora 
user does of course, but in the end we're updating packages, taking snapshots, 
and building code.  We're just doing it at 1000x what a normal Fedora user does. 
 Thanks,


Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/6/20 8:21 PM, Chris Murphy wrote:

...

> Yes. Also in fuzzing there is the concept of "when to stop fuzzing"
> because it's a rabbit hole, you have to come up for air at some point,
> and work on other things. But you raise a good and subtle point which
> is also that ext4 has a very good fsck built up over decades, they
> succeed today from past failures. It's no different with Btrfs.
> 
> But also there is a bias. ext4 needs fsck to succeed in the worst
> cases in order to mount the file system.

Really?

> Btrfs doesn't need that.
> Often it can tolerate a read-only mount without any other mount
> option; 

Well, this assertion can be tested, so let's do that as well;
I'll do 50 runs of:

* mkfs w/ -m single as would happen on SSD
* fuzz 2048 byte of that 1G image at random
* mount -o ro, tally mount failures
* count missing/unreachable files if mount -o ro succeeds

<50 runs later on btrfs>

16 readonly mounts failed (32% failure rate)
Within the successful mounts, 1 or more files were unreachable in 30 attempts.
Across all 50 attempts, 7720 files were lost.

Is that better than ext4, and will ext4 need fsck just to be able to mount?

<50 runs later on ext4, same strategy>

zero mount failures for ext4.
Within the successful mounts, 1 or more files were unreachable in 2 attempts.
Across all 50 attempts, 48 files were lost.

It does not seem that btrfs has any unique or superior mount -o ro
recovery capabilities, either.

> and optionally can be made more tolerant to errors while still
> mounting read-only. This is a significant difference in recovery
> strategy. An fsck is something of a risk because it is writing changes
> to the file system. It is irreversible. Btrfs takes a different view,
> which is to increase the chance of recovery without needing a risky
> repair as the first step. Once your important data is out, now try the
> repair. Good chance it works, but maybe not as good as ext4's.

That's not supported by any of these test results.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/6/20 12:07 AM, Chris Murphy wrote:
> On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen 
> wrote:
>> 
>> On 7/3/20 1:41 PM, Chris Murphy wrote:
>>> SSDs can fail in weird ways. Some spew garbage as they're
>>> failing, some go read-only. I've seen both. I don't have stats on
>>> how common it is for an SSD to go read-only as it fails, but once
>>> it happens you cannot fsck it. It won't accept writes. If it
>>> won't mount, your only chance to recover data is some kind of
>>> offline scrape tool. And Btrfs does have a very very good scrape
>>> tool, in terms of its success rate - UX is scary. But that can
>>> and will improve.
>> 
>> Ok, you and Josef have both recommended the btrfs restore
>> ("scrape") tool as a next recovery step after fsck fails, and I
>> figured we should check that out, to see if that alleviates the
>> concerns about recoverability of user data in the face of
>> corruption.
>> 
>> I also realized that mkfs of an image isn't representative of an
>> SSD system typical of Fedora laptops, so I added "-m single" to
>> mkfs, because this will be the mkfs.btrfs default on SSDs (right?).
>> Based on Josef's description of fsck's algorithm of throwing away
>> any block with a bad CRC this seemed worth testing.
>> 
>> I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G 
>> image, or a bit less than 1% of the filesystem blocks, at random. 
>> This is 1/4 the fuzzing rate from the original test.
>> 
>> So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair, 
>> mount, mount w/ recovery, and then restore ("scrape") if all that 
>> fails, see what we get.
> 
> What's the probability of this kind of corruption occurring in the 
> real world? If the probability is so low it can't practically be 
> computed, how do we assess the risk? And if we can't assess risk, 
> what's the basis of concern?

From 20 years of filesystem development experience, I know that people
run filesystem repair tools.  It's just a fact.  For a wide variety of
reasons - from bugs, to hardware errors, to admin errors, you name it,
filesystems experience corruption and inconsistencies.  At that point
the administrator needs a path forward.

"people won't need to repair btrfs" is, IMHO, the position that needs
to be supported, not "filesystem repair tools should be robust."

>> I ran 50 loops, and got:
>> 
>> 46 btrfsck failures 20 mount failures
>> 
>> So it ran btrfs restore 20 times; of those, 11 runs lost all or 
>> substantially all of the files; 17 runs lost at least 1/3 of the 
>> files.
> 
> Josef states reliability of ext4, xfs, and Btrfs are in the same 
> ballpark. He also reports one case in 10 years in which he failed to 
> recover anything. How do you square that with 11 complete failures, 
> trivially produced? Is there even a reason to suspect there's
> residual risk?

Extrapolating from Facebook's usecases to the fedora desktop should be
approached with caution, IMHO.

I've provided evidence that if/when damage happens for whatever reason,
btrfs is unable to recover in place far more often than other filesytems.

> When metadata is single profile, Btrfs is basically an early warning 
> system.> The available research on uncorrectable errors, errors that drive ECC
> does not catch, suggests that users are decently likely to experience
> at least one block of corruption in the life of the drive. And that
> it tends to get worse up until drive failure. But there is much less
> chance to detect this, if the file system isn't also checksumming the
> vastly larger payload on a drive: the data.

One of the problems in this whole discussion is the assumption that filesystem
inconsistencies only arise from disk bitflips etc; that's just not the case.

Look, I'm just providing evidence of what I've found when re-evaluating the
btrfs administration/repair tools.  I've found them to be quite weak.

From what I've gathered from these responses, btrfs is unique in that it is
/expected/ that if anything goes wrong, the administrator should be prepared
to scrape out remaining data, re-mkfs, and start over.  If that's acceptable
for the Fedora desktop, that's fine, but I consider it a risk that should not
be ignored when evaluating this proposal.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-09 Thread Christopher Engelhard

On 08.07.20 23:47, Adam Williamson wrote:
> I think it's `efibootmgr -b  -L DefinitelyNotFedora`, where  is
> the number of the entry called 'Fedora', which you could find by just
> running `efibootmgr` to get a list of entries. -b selects the entry to
> operate on and -L changes the 'label', which I think is what we're
> dealing with here.

AFAIK efibootmgr can't change the label of an existing entry, but you
can delete it and then recreate it with the new name.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-08 Thread Adam Williamson

On Wed, 2020-07-08 at 17:23 -0400, James Cassell wrote:
> On Tue, Jul 7, 2020, at 12:30 PM, Adam Williamson wrote:
> > On Mon, 2020-07-06 at 20:06 -0600, Chris Murphy wrote:
> > > On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  
> > > wrote:
> > > > 
> > > > On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
> > > > 
> > > > > On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek 
> > > > > wrote:
> > > > > > Making btrfs opt-in for F33 and (assuming the result go well) 
> > > > > > opt-out for F34
> > > > > > could be good option. I know technically it is already opt-in, but 
> > > > > > it's not
> > > > > > very visible or popular. We could make the btrfs option more 
> > > > > > prominent and
> > > > > > ask people to pick it if they are ready to handle potential fallout.
> > > > > 
> > > > > I'm leaning towards recommending this as well. I feel like we don't 
> > > > > have
> > > > > good data to make a decision on -- the work that Red Hat did 
> > > > > previously when
> > > > > making a decision was 1) years ago and 2) server-focused, and the 
> > > > > Facebook
> > > > > production usage is encouraging but also not the same use case. I'm
> > > > > particularly concerned about metadata corruption fragility as noted 
> > > > > in the
> > > > > Usenix paper. (It'd be nice if we could do something about that!)
> > > > 
> > > > So if one has a spare partition to play with btrfs, is there an easy
> > > > way to install a second copy of Fedora without having the /boot/efi/
> > > > entries overwrite the existing Fedora installation?  Or fix it to have
> > > > 2 separate entries after the fact?
> > > 
> > > 
> > > It's possible but has challenges. Separate ESP's you'll need to either
> > > (a) use the firmware's built-in boot manager to choose what will
> > > probably appear to be identically named Fedora's
> > 
> > No, you have to rename the first one before doing the second install.
> > anaconda explicitly deletes any existing efibootmgr entry named
> > "Fedora" before creating a new one.
> 
> Any idea if this process is documented?

I think it's `efibootmgr -b  -L DefinitelyNotFedora`, where  is
the number of the entry called 'Fedora', which you could find by just
running `efibootmgr` to get a list of entries. -b selects the entry to
operate on and -L changes the 'label', which I think is what we're
dealing with here.

If you do that before doing the second install, you *should* be able to
choose between them by using whatever mechanism your firmware offers to
select an EFI boot manager entry at boot time. The one called
DefinitelyNotFedora would be the first install, the one called Fedora
would be the second install.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-08 Thread James Cassell

On Tue, Jul 7, 2020, at 12:30 PM, Adam Williamson wrote:
> On Mon, 2020-07-06 at 20:06 -0600, Chris Murphy wrote:
> > On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
> > > 
> > > On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
> > > 
> > > > On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek 
> > > > wrote:
> > > > > Making btrfs opt-in for F33 and (assuming the result go well) opt-out 
> > > > > for F34
> > > > > could be good option. I know technically it is already opt-in, but 
> > > > > it's not
> > > > > very visible or popular. We could make the btrfs option more 
> > > > > prominent and
> > > > > ask people to pick it if they are ready to handle potential fallout.
> > > > 
> > > > I'm leaning towards recommending this as well. I feel like we don't have
> > > > good data to make a decision on -- the work that Red Hat did previously 
> > > > when
> > > > making a decision was 1) years ago and 2) server-focused, and the 
> > > > Facebook
> > > > production usage is encouraging but also not the same use case. I'm
> > > > particularly concerned about metadata corruption fragility as noted in 
> > > > the
> > > > Usenix paper. (It'd be nice if we could do something about that!)
> > > 
> > > So if one has a spare partition to play with btrfs, is there an easy
> > > way to install a second copy of Fedora without having the /boot/efi/
> > > entries overwrite the existing Fedora installation?  Or fix it to have
> > > 2 separate entries after the fact?
> > 
> > 
> > It's possible but has challenges. Separate ESP's you'll need to either
> > (a) use the firmware's built-in boot manager to choose what will
> > probably appear to be identically named Fedora's
> 
> No, you have to rename the first one before doing the second install.
> anaconda explicitly deletes any existing efibootmgr entry named
> "Fedora" before creating a new one.

Any idea if this process is documented?

I typically install on a laptop, with the "encrypt my data" option.

I can confirm that the only way to successfully have 2 side-by-side Fedora 
installs with UEFI, using only Anaconda to set it up, is to have 2 separate 
physical disks, and choose which physical disk to boot by hitting F12 at 
machine power on.

Any attempts to share /boot result in at least one of the installs being broken.

Any attempts to share /boot/efi breaks at least fedora-by-fedora installs.

Adding a separate /boot/efi partition for the second Fedora install makes the 
resulting system usable on the new Fedora install, but there is no obvious way 
to boot into the older Fedora install.

If you unlock the disks within Anaconda for the existing Fedora install, grub 
gets boot entries for that install, but they are non-functional.  (No password 
is prompted for unlocking the disk, indefinite hang.)

What /does/ seem to work is having RHEL and Fedora side-by-side on the same 
disk, as long as each has its own /boot and /boot/efi partitions.

Generally, I'd like the fedora-by-fedora parallel installs to work better 
because that's how I'm best able to participate in the Test Matrix.

V/r,
James Cassell
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Markus Larsson



On 7 July 2020 18:31:32 CEST, Adam Williamson  
wrote:
>On Tue, 2020-07-07 at 06:02 +, Zbigniew Jędrzejewski-Szmek wrote:
>> On Mon, Jul 06, 2020 at 08:06:05PM -0600, Chris Murphy wrote:
>> > On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
>> > > So if one has a spare partition to play with btrfs, is there an easy
>> > > way to install a second copy of Fedora without having the /boot/efi/
>> > > entries overwrite the existing Fedora installation?  Or fix it to have
>> > > 2 separate entries after the fact?
>> > 
>> > It's possible but has challenges. Separate ESP's you'll need to either
>> > (a) use the firmware's built-in boot manager to choose what will
>> > probably appear to be identically named Fedora's (b) add new NVRAM
>> > entries, and names, and switch between them before reboot by using
>> > efibootmgr --bootorder or --bootnext.
>> > 
>> > Another option is shared ESP and /boot but my vague recollection is
>> > some things go away. For sure /boot/efi/EFI/fedora is replaced, and
>> > then possibly /boot/loader/entries are replaced. But that might be
>> > easier to deal with than the above, and more efficient.
>> 
>> This is so sad. Boot Loader Specification was explicitly designed to
>> support parallel installations on a single ESP. (The case of different
>> systems was the goal, but the general logic works for different
>> installations of the same system as well.) BLS entries are stored
>> underneath $ESP/, so different Fedora installations which
>> have different machine-id numbers simply don't conflict. sd-boot just
>> displays the combined list. If two entries happen to be *exactly* the
>> same — same os name, same os version, same kernel version — it'll use
>> the machine-id in the entry title to disambiguate them to the user (*).
>> 
>> There is really no reason for this not to work. If are considering
>> separate ESPs and efibootmgr to switch between them then something
>> went rather wrong somewhere.
>
>I can't speak for Chris, but I was honestly just gaming it out in my
>head, trying to think how I'd try it if I was going to do it. I've
>never actually tried it myself.

The easy way to do it is to keep the same ESP and solve it with a nice little 
GRUB config.
It works well even between distributions.
You can of course break it by having one of the distributions overwrite it 
wrongly but that's easily fixed and prevented.

M
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Adam Williamson

On Tue, 2020-07-07 at 06:02 +, Zbigniew Jędrzejewski-Szmek wrote:
> On Mon, Jul 06, 2020 at 08:06:05PM -0600, Chris Murphy wrote:
> > On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
> > > So if one has a spare partition to play with btrfs, is there an easy
> > > way to install a second copy of Fedora without having the /boot/efi/
> > > entries overwrite the existing Fedora installation?  Or fix it to have
> > > 2 separate entries after the fact?
> > 
> > It's possible but has challenges. Separate ESP's you'll need to either
> > (a) use the firmware's built-in boot manager to choose what will
> > probably appear to be identically named Fedora's (b) add new NVRAM
> > entries, and names, and switch between them before reboot by using
> > efibootmgr --bootorder or --bootnext.
> > 
> > Another option is shared ESP and /boot but my vague recollection is
> > some things go away. For sure /boot/efi/EFI/fedora is replaced, and
> > then possibly /boot/loader/entries are replaced. But that might be
> > easier to deal with than the above, and more efficient.
> 
> This is so sad. Boot Loader Specification was explicitly designed to
> support parallel installations on a single ESP. (The case of different
> systems was the goal, but the general logic works for different
> installations of the same system as well.) BLS entries are stored
> underneath $ESP/, so different Fedora installations which
> have different machine-id numbers simply don't conflict. sd-boot just
> displays the combined list. If two entries happen to be *exactly* the
> same — same os name, same os version, same kernel version — it'll use
> the machine-id in the entry title to disambiguate them to the user (*).
> 
> There is really no reason for this not to work. If are considering
> separate ESPs and efibootmgr to switch between them then something
> went rather wrong somewhere.

I can't speak for Chris, but I was honestly just gaming it out in my
head, trying to think how I'd try it if I was going to do it. I've
never actually tried it myself.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Adam Williamson

On Mon, 2020-07-06 at 20:06 -0600, Chris Murphy wrote:
> On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
> > 
> > On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
> > 
> > > On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek 
> > > wrote:
> > > > Making btrfs opt-in for F33 and (assuming the result go well) opt-out 
> > > > for F34
> > > > could be good option. I know technically it is already opt-in, but it's 
> > > > not
> > > > very visible or popular. We could make the btrfs option more prominent 
> > > > and
> > > > ask people to pick it if they are ready to handle potential fallout.
> > > 
> > > I'm leaning towards recommending this as well. I feel like we don't have
> > > good data to make a decision on -- the work that Red Hat did previously 
> > > when
> > > making a decision was 1) years ago and 2) server-focused, and the Facebook
> > > production usage is encouraging but also not the same use case. I'm
> > > particularly concerned about metadata corruption fragility as noted in the
> > > Usenix paper. (It'd be nice if we could do something about that!)
> > 
> > So if one has a spare partition to play with btrfs, is there an easy
> > way to install a second copy of Fedora without having the /boot/efi/
> > entries overwrite the existing Fedora installation?  Or fix it to have
> > 2 separate entries after the fact?
> 
> 
> It's possible but has challenges. Separate ESP's you'll need to either
> (a) use the firmware's built-in boot manager to choose what will
> probably appear to be identically named Fedora's

No, you have to rename the first one before doing the second install.
anaconda explicitly deletes any existing efibootmgr entry named
"Fedora" before creating a new one.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Chris Murphy

On Tue, Jul 7, 2020 at 9:25 AM Lennart Poettering  wrote:

> Thou shallt not have multiple ESPs per disk. See:
>
> https://news.ycombinator.com/item?id=16261695
>
> The EFI spec is kinda vague about it, but it breaks everywhere, in
> particular with Windows.

The Windows *installer* doesn't like it. I'm not aware of Windows
itself having difficulty with it. I have tested this layout. But, it
could be there are UEFI bugs abound.

In any case, this is not general advocacy of two ESPs, so thanks for
the criticism. To be clear, I offer it only for advanced users who are
somewhat prepared for confusion of unknown manifestation. :)

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Lennart Poettering

On Mo, 06.07.20 20:06, Chris Murphy (li...@colorremedies.com) wrote:

> On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
> >
> > On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
> >
> > >On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek 
> > >wrote:
> > >> Making btrfs opt-in for F33 and (assuming the result go well) opt-out 
> > >> for F34
> > >> could be good option. I know technically it is already opt-in, but it's 
> > >> not
> > >> very visible or popular. We could make the btrfs option more prominent 
> > >> and
> > >> ask people to pick it if they are ready to handle potential fallout.
> > >
> > >I'm leaning towards recommending this as well. I feel like we don't have
> > >good data to make a decision on -- the work that Red Hat did previously 
> > >when
> > >making a decision was 1) years ago and 2) server-focused, and the Facebook
> > >production usage is encouraging but also not the same use case. I'm
> > >particularly concerned about metadata corruption fragility as noted in the
> > >Usenix paper. (It'd be nice if we could do something about that!)
> >
> > So if one has a spare partition to play with btrfs, is there an easy
> > way to install a second copy of Fedora without having the /boot/efi/
> > entries overwrite the existing Fedora installation?  Or fix it to have
> > 2 separate entries after the fact?
>
>
> It's possible but has challenges. Separate ESP's you'll need to
> either

Thou shallt not have multiple ESPs per disk. See:

https://news.ycombinator.com/item?id=16261695

The EFI spec is kinda vague about it, but it breaks everywhere, in
particular with Windows.

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Matthew Miller

On Wed, Jul 01, 2020 at 03:50:37PM -0400, Josef Bacik wrote:
> I've stated this many times before, btrfs is more vulnerable to
> things going wrong.  It's also more likely to notice things going
> wrong.  There's things we can do to make it easier in the face of
> these issues, they're patches I've written and submitted in the last
> few days.  There's bigger, more complex things that I can do to make
> us more resilient in the face of these corruptions.  But even with
> all of the things I have in my head, I could still go do one or two
> things and render the file system unusable.  Would these things
> happen in practice?  Unlikely.  Is it impossible?  Unfortunately no.

Thanks Josef. I definitely appreciate your responsiveness here, and your
explanation helps me understand things better.


-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Qiyu Yan

David Sterba  于2020年7月7日周二 下午6:09写道：
>
> > Yes, BtrFs was very unstable, but before. Every software has this process.  
> > I
> > have talked to one of the maintainer of BtrFs, she thinks that BtrFs is 
> > ready
> > to production usage. (few years before, she is strongly against using BtrFs
> > for production purpose).
>
> May I ask who was the person you talked to? I'm asking as the active 
> maintainer
> of btrfs. I'm familiar who does what in the community and overall status so it
> would be of my community interest to know who is speaking on behalf of the
> project, without me having even a slightest idea who that could be.
I may have take it wrongly, she is an developer at SUSE but not a maintainer.
Sorry for my mistake and that is only a personal opinion.

And that is a private talk.
>
> If you don't want to disclose the name in public, feel free to respond in
> private.
>
> Thanks.
[1] https://twitter.com/mawei_spoiler/status/1275692573999407108
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread David Sterba

> Yes, BtrFs was very unstable, but before. Every software has this process.  I
> have talked to one of the maintainer of BtrFs, she thinks that BtrFs is ready
> to production usage. (few years before, she is strongly against using BtrFs
> for production purpose).

May I ask who was the person you talked to? I'm asking as the active maintainer
of btrfs. I'm familiar who does what in the community and overall status so it
would be of my community interest to know who is speaking on behalf of the
project, without me having even a slightest idea who that could be.

If you don't want to disclose the name in public, feel free to respond in
private.

Thanks.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-07 Thread Zbigniew Jędrzejewski-Szmek

On Mon, Jul 06, 2020 at 08:06:05PM -0600, Chris Murphy wrote:
> On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
> > So if one has a spare partition to play with btrfs, is there an easy
> > way to install a second copy of Fedora without having the /boot/efi/
> > entries overwrite the existing Fedora installation?  Or fix it to have
> > 2 separate entries after the fact?
>
> It's possible but has challenges. Separate ESP's you'll need to either
> (a) use the firmware's built-in boot manager to choose what will
> probably appear to be identically named Fedora's (b) add new NVRAM
> entries, and names, and switch between them before reboot by using
> efibootmgr --bootorder or --bootnext.
> 
> Another option is shared ESP and /boot but my vague recollection is
> some things go away. For sure /boot/efi/EFI/fedora is replaced, and
> then possibly /boot/loader/entries are replaced. But that might be
> easier to deal with than the above, and more efficient.

This is so sad. Boot Loader Specification was explicitly designed to
support parallel installations on a single ESP. (The case of different
systems was the goal, but the general logic works for different
installations of the same system as well.) BLS entries are stored
underneath $ESP/, so different Fedora installations which
have different machine-id numbers simply don't conflict. sd-boot just
displays the combined list. If two entries happen to be *exactly* the
same — same os name, same os version, same kernel version — it'll use
the machine-id in the entry title to disambiguate them to the user (*).

There is really no reason for this not to work. If are considering
separate ESPs and efibootmgr to switch between them then something
went rather wrong somewhere.

(*) If there are two installations with overlapping kernel versions, the UI
is not going to be great, because there will be entries like
Fedora 33 (Workstation) 5.11.21-23.fc33.amd64 08a5690a2eed47cf92ac0a5d2e3cf6b0
Fedora 33 (Workstation) 5.11.21-23.fc33.amd64 949499494994999393939ad2ad99
Fedora 33 (Workstation) 5.10.11-18.fc33.amd64 08a5690a2eed47cf92ac0a5d2e3cf6b0
Fedora 33 (Workstation) 5.10.11-18.fc33.amd64 949499494994999393939ad2ad99
i.e. the entries for the two installations will be interleaved. So the
user needs to remember that e.g. 08a5690a2eed47cf92ac0a5d2e3cf6b0 is the
installation with ext4 and 949499494994999393939ad2ad99 the one with
btrfs.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Chris Murphy

On Mon, Jul 6, 2020 at 5:30 PM Samuel Sieb  wrote:
>
> On 7/6/20 4:24 PM, Adam Williamson wrote:
> > If you mean the EFI boot manager entry, just renaming the existing one
> > something other than "Fedora" ought to do the trick, I think. So far as
> > /boot/efi goes...well, you have two choices. You can have the two
> > installs share one, or have two separate ones. I *think* both options
> > at least in theory ought to work, I'm not sure if anyone's tested...
>
> Adding another EFI partition should work, but I don't see how they could
> share one.  There's a single Fedora directory in there, so each install
> would overwrite the files of the other.  The grub.cfg can only point to
> one set of boot loader entries.

And that means sharing one ESP means sharing /boot - yeah. Hmm. I'll
have to test it but I'm pretty sure it's a fairly simple post-install
fix to get that to work. I'm not totally certain how blscfg.mod parses
/boot/loader/entries containing bls snippets with two machine IDs.

O I just thought of something. F32 and older depend on grubenv to
store part of the kernel command line variables. So I think it will be
necessary to do an F32 F33 side by side install for it to work. Two
F32's side by side will compete over the single grubenv.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Chris Murphy

On Mon, Jul 6, 2020 at 4:48 PM Gerald Henriksen  wrote:
>
> On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
>
> >On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek wrote:
> >> Making btrfs opt-in for F33 and (assuming the result go well) opt-out for 
> >> F34
> >> could be good option. I know technically it is already opt-in, but it's not
> >> very visible or popular. We could make the btrfs option more prominent and
> >> ask people to pick it if they are ready to handle potential fallout.
> >
> >I'm leaning towards recommending this as well. I feel like we don't have
> >good data to make a decision on -- the work that Red Hat did previously when
> >making a decision was 1) years ago and 2) server-focused, and the Facebook
> >production usage is encouraging but also not the same use case. I'm
> >particularly concerned about metadata corruption fragility as noted in the
> >Usenix paper. (It'd be nice if we could do something about that!)
>
> So if one has a spare partition to play with btrfs, is there an easy
> way to install a second copy of Fedora without having the /boot/efi/
> entries overwrite the existing Fedora installation?  Or fix it to have
> 2 separate entries after the fact?


It's possible but has challenges. Separate ESP's you'll need to either
(a) use the firmware's built-in boot manager to choose what will
probably appear to be identically named Fedora's (b) add new NVRAM
entries, and names, and switch between them before reboot by using
efibootmgr --bootorder or --bootnext.

Another option is shared ESP and /boot but my vague recollection is
some things go away. For sure /boot/efi/EFI/fedora is replaced, and
then possibly /boot/loader/entries are replaced. But that might be
easier to deal with than the above, and more efficient.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Chris Murphy

On Mon, Jul 6, 2020 at 9:52 AM Stephen John Smoogen  wrote:
>
> On Mon, 6 Jul 2020 at 01:19, Chris Murphy  wrote:
> >
> > On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen  wrote:
> > >
> > > On 7/3/20 1:41 PM, Chris Murphy wrote:
> > > > SSDs can fail in weird ways. Some spew garbage as they're failing,
> > > > some go read-only. I've seen both. I don't have stats on how common it
> > > > is for an SSD to go read-only as it fails, but once it happens you
> > > > cannot fsck it. It won't accept writes. If it won't mount, your only
> > > > chance to recover data is some kind of offline scrape tool. And Btrfs
> > > > does have a very very good scrape tool, in terms of its success rate -
> > > > UX is scary. But that can and will improve.
> > >
> > > Ok, you and Josef have both recommended the btrfs restore ("scrape")
> > > tool as a next recovery step after fsck fails, and I figured we should
> > > check that out, to see if that alleviates the concerns about
> > > recoverability of user data in the face of corruption.
> > >
> > > I also realized that mkfs of an image isn't representative of an SSD
> > > system typical of Fedora laptops, so I added "-m single" to mkfs,
> > > because this will be the mkfs.btrfs default on SSDs (right?).  Based
> > > on Josef's description of fsck's algorithm of throwing away any
> > > block with a bad CRC this seemed worth testing.
> > >
> > > I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
> > > image, or a bit less than 1% of the filesystem blocks, at random.
> > > This is 1/4 the fuzzing rate from the original test.
> > >
> > > So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
> > > mount, mount w/ recovery, and then restore ("scrape") if all that
> > > fails, see what we get.
> >
> > What's the probability of this kind of corruption occurring in the
> > real world? If the probability is so low it can't practically be
> > computed, how do we assess the risk? And if we can't assess risk,
> > what's the basis of concern?
> >
>
> Aren't most disk failure tests 'huh it somehow happened at least once
> and I think this explains all these other failures too?' I know that
> with giant clusters you can do more testing but you also have a lot of
> things like
>
> What is the chance that a disk will die over time? 100%
> What is the chance that a disk died from this particular scenario?
> 0.0 %
> reword the question slightly differently.. What is the chance this
> disk died from that scenario? 100%.

Yes. Also in fuzzing there is the concept of "when to stop fuzzing"
because it's a rabbit hole, you have to come up for air at some point,
and work on other things. But you raise a good and subtle point which
is also that ext4 has a very good fsck built up over decades, they
succeed today from past failures. It's no different with Btrfs.

But also there is a bias. ext4 needs fsck to succeed in the worst
cases in order to mount the file system. Btrfs doesn't need that.
Often it can tolerate a read-only mount without any other mount
option; and optionally can be made more tolerant to errors while still
mounting read-only. This is a significant difference in recovery
strategy. An fsck is something of a risk because it is writing changes
to the file system. It is irreversible. Btrfs takes a different view,
which is to increase the chance of recovery without needing a risky
repair as the first step. Once your important data is out, now try the
repair. Good chance it works, but maybe not as good as ext4's.

--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Samuel Sieb


On 7/6/20 4:24 PM, Adam Williamson wrote:

If you mean the EFI boot manager entry, just renaming the existing one
something other than "Fedora" ought to do the trick, I think. So far as
/boot/efi goes...well, you have two choices. You can have the two
installs share one, or have two separate ones. I *think* both options
at least in theory ought to work, I'm not sure if anyone's tested...


Adding another EFI partition should work, but I don't see how they could 
share one.  There's a single Fedora directory in there, so each install 
would overwrite the files of the other.  The grub.cfg can only point to 
one set of boot loader entries.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Adam Williamson

On Mon, 2020-07-06 at 16:24 -0700, Adam Williamson wrote:
> On Mon, 2020-07-06 at 18:48 -0400, Gerald Henriksen wrote:
> > On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
> > 
> > > On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek 
> > > wrote:
> > > > Making btrfs opt-in for F33 and (assuming the result go well) opt-out 
> > > > for F34
> > > > could be good option. I know technically it is already opt-in, but it's 
> > > > not
> > > > very visible or popular. We could make the btrfs option more prominent 
> > > > and
> > > > ask people to pick it if they are ready to handle potential fallout.
> > > 
> > > I'm leaning towards recommending this as well. I feel like we don't have
> > > good data to make a decision on -- the work that Red Hat did previously 
> > > when
> > > making a decision was 1) years ago and 2) server-focused, and the Facebook
> > > production usage is encouraging but also not the same use case. I'm
> > > particularly concerned about metadata corruption fragility as noted in the
> > > Usenix paper. (It'd be nice if we could do something about that!)
> > 
> > So if one has a spare partition to play with btrfs, is there an easy
> > way to install a second copy of Fedora without having the /boot/efi/
> > entries overwrite the existing Fedora installation?  Or fix it to have
> > 2 separate entries after the fact?
> 
> If you mean the EFI boot manager entry, just renaming the existing one
> something other than "Fedora" ought to do the trick, I think. So far as
> /boot/efi goes...well, you have two choices. You can have the two
> installs share one, or have two separate ones. I *think* both options
> at least in theory ought to work, I'm not sure if anyone's tested...

actually, no, thinking about it harder, I think sharing one wouldn't
work right. I think you have to have a new /boot/efi for the second
install, as well as a separate /boot.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Adam Williamson

On Mon, 2020-07-06 at 18:48 -0400, Gerald Henriksen wrote:
> On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:
> 
> > On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek wrote:
> > > Making btrfs opt-in for F33 and (assuming the result go well) opt-out for 
> > > F34
> > > could be good option. I know technically it is already opt-in, but it's 
> > > not
> > > very visible or popular. We could make the btrfs option more prominent and
> > > ask people to pick it if they are ready to handle potential fallout.
> > 
> > I'm leaning towards recommending this as well. I feel like we don't have
> > good data to make a decision on -- the work that Red Hat did previously when
> > making a decision was 1) years ago and 2) server-focused, and the Facebook
> > production usage is encouraging but also not the same use case. I'm
> > particularly concerned about metadata corruption fragility as noted in the
> > Usenix paper. (It'd be nice if we could do something about that!)
> 
> So if one has a spare partition to play with btrfs, is there an easy
> way to install a second copy of Fedora without having the /boot/efi/
> entries overwrite the existing Fedora installation?  Or fix it to have
> 2 separate entries after the fact?

If you mean the EFI boot manager entry, just renaming the existing one
something other than "Fedora" ought to do the trick, I think. So far as
/boot/efi goes...well, you have two choices. You can have the two
installs share one, or have two separate ones. I *think* both options
at least in theory ought to work, I'm not sure if anyone's tested...
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Gerald Henriksen

On Wed, 1 Jul 2020 14:24:37 -0400, you wrote:

>On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew J?drzejewski-Szmek wrote:
>> Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34
>> could be good option. I know technically it is already opt-in, but it's not
>> very visible or popular. We could make the btrfs option more prominent and
>> ask people to pick it if they are ready to handle potential fallout.
>
>I'm leaning towards recommending this as well. I feel like we don't have
>good data to make a decision on -- the work that Red Hat did previously when
>making a decision was 1) years ago and 2) server-focused, and the Facebook
>production usage is encouraging but also not the same use case. I'm
>particularly concerned about metadata corruption fragility as noted in the
>Usenix paper. (It'd be nice if we could do something about that!)

So if one has a spare partition to play with btrfs, is there an easy
way to install a second copy of Fedora without having the /boot/efi/
entries overwrite the existing Fedora installation?  Or fix it to have
2 separate entries after the fact?
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Przemek Klosowski via devel


On 7/2/20 4:38 PM, Eric Sandeen wrote:

Running 10 loops on each of btrfs, ext4, and xfs I got results that look
like this (ext4 always creates empty lost+found so it will always find at
least 1 file there)

btrfs
...
== 4 fsck failures, 2 mount failures

ext4
...
== 0 fsck failures, 0 mount failures

xfs
...
== 0 fsck failures, 0 mount failures


Did you check the content of the filesystem, to make sure that the files 
restored by fsck are actually correct?


I think  ext4/xfs may be showing 0 files lost but they may or may not 
contain the pre-damage content, while btrfs would just fess up that it 
lost them if the checksums didn't agree.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Stephen John Smoogen

On Mon, 6 Jul 2020 at 01:19, Chris Murphy  wrote:
>
> On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen  wrote:
> >
> > On 7/3/20 1:41 PM, Chris Murphy wrote:
> > > SSDs can fail in weird ways. Some spew garbage as they're failing,
> > > some go read-only. I've seen both. I don't have stats on how common it
> > > is for an SSD to go read-only as it fails, but once it happens you
> > > cannot fsck it. It won't accept writes. If it won't mount, your only
> > > chance to recover data is some kind of offline scrape tool. And Btrfs
> > > does have a very very good scrape tool, in terms of its success rate -
> > > UX is scary. But that can and will improve.
> >
> > Ok, you and Josef have both recommended the btrfs restore ("scrape")
> > tool as a next recovery step after fsck fails, and I figured we should
> > check that out, to see if that alleviates the concerns about
> > recoverability of user data in the face of corruption.
> >
> > I also realized that mkfs of an image isn't representative of an SSD
> > system typical of Fedora laptops, so I added "-m single" to mkfs,
> > because this will be the mkfs.btrfs default on SSDs (right?).  Based
> > on Josef's description of fsck's algorithm of throwing away any
> > block with a bad CRC this seemed worth testing.
> >
> > I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
> > image, or a bit less than 1% of the filesystem blocks, at random.
> > This is 1/4 the fuzzing rate from the original test.
> >
> > So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
> > mount, mount w/ recovery, and then restore ("scrape") if all that
> > fails, see what we get.
>
> What's the probability of this kind of corruption occurring in the
> real world? If the probability is so low it can't practically be
> computed, how do we assess the risk? And if we can't assess risk,
> what's the basis of concern?
>

Aren't most disk failure tests 'huh it somehow happened at least once
and I think this explains all these other failures too?' I know that
with giant clusters you can do more testing but you also have a lot of
things like

What is the chance that a disk will die over time? 100%
What is the chance that a disk died from this particular scenario?
0.0 %
reword the question slightly differently.. What is the chance this
disk died from that scenario? 100%.

For the HPC computers we had a score of Phd staticians coming up with
all kinds of papers on disk failure modes which if asked in one way
would come up with practically 0% odds it would happen. However all of
the disk failures had happened at least once over a time frame...
sometimes a short one, sometimes a long one, sometimes so often that
someone had to retract a paper because it was clear that while the
maths said it shouldn't happen .. it did in real life. 

-- 
Stephen J Smoogen.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-06 Thread Josef Bacik


On 7/3/20 10:39 PM, Eric Sandeen wrote:

On 7/3/20 1:41 PM, Chris Murphy wrote:

SSDs can fail in weird ways. Some spew garbage as they're failing,
some go read-only. I've seen both. I don't have stats on how common it
is for an SSD to go read-only as it fails, but once it happens you
cannot fsck it. It won't accept writes. If it won't mount, your only
chance to recover data is some kind of offline scrape tool. And Btrfs
does have a very very good scrape tool, in terms of its success rate -
UX is scary. But that can and will improve.


Ok, you and Josef have both recommended the btrfs restore ("scrape")
tool as a next recovery step after fsck fails, and I figured we should
check that out, to see if that alleviates the concerns about
recoverability of user data in the face of corruption.

I also realized that mkfs of an image isn't representative of an SSD
system typical of Fedora laptops, so I added "-m single" to mkfs,
because this will be the mkfs.btrfs default on SSDs (right?).  Based
on Josef's description of fsck's algorithm of throwing away any
block with a bad CRC this seemed worth testing.

I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
image, or a bit less than 1% of the filesystem blocks, at random.
This is 1/4 the fuzzing rate from the original test.

So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
mount, mount w/ recovery, and then restore ("scrape") if all that
fails, see what we get.

I ran 50 loops, and got:

46 btrfsck failures
20 mount failures

So it ran btrfs restore 20 times; of those, 11 runs lost all or
substantially all of the files; 17 runs lost at least 1/3 of the
files.


Hmm I wonder if some of my "ignore X failures" stuff got lost over the years, we 
should be able to recover far more than that.  I'll add it to the list of things 
to dig into this week.  Thanks,


Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-05 Thread Chris Murphy

On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen  wrote:
>
> On 7/3/20 1:41 PM, Chris Murphy wrote:
> > SSDs can fail in weird ways. Some spew garbage as they're failing,
> > some go read-only. I've seen both. I don't have stats on how common it
> > is for an SSD to go read-only as it fails, but once it happens you
> > cannot fsck it. It won't accept writes. If it won't mount, your only
> > chance to recover data is some kind of offline scrape tool. And Btrfs
> > does have a very very good scrape tool, in terms of its success rate -
> > UX is scary. But that can and will improve.
>
> Ok, you and Josef have both recommended the btrfs restore ("scrape")
> tool as a next recovery step after fsck fails, and I figured we should
> check that out, to see if that alleviates the concerns about
> recoverability of user data in the face of corruption.
>
> I also realized that mkfs of an image isn't representative of an SSD
> system typical of Fedora laptops, so I added "-m single" to mkfs,
> because this will be the mkfs.btrfs default on SSDs (right?).  Based
> on Josef's description of fsck's algorithm of throwing away any
> block with a bad CRC this seemed worth testing.
>
> I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
> image, or a bit less than 1% of the filesystem blocks, at random.
> This is 1/4 the fuzzing rate from the original test.
>
> So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
> mount, mount w/ recovery, and then restore ("scrape") if all that
> fails, see what we get.

What's the probability of this kind of corruption occurring in the
real world? If the probability is so low it can't practically be
computed, how do we assess the risk? And if we can't assess risk,
what's the basis of concern?

> I ran 50 loops, and got:
>
> 46 btrfsck failures
> 20 mount failures
>
> So it ran btrfs restore 20 times; of those, 11 runs lost all or
> substantially all of the files; 17 runs lost at least 1/3 of the
> files.

Josef states reliability of ext4, xfs, and Btrfs are in the same
ballpark. He also reports one case in 10 years in which he failed to
recover anything. How do you square that with 11 complete failures,
trivially produced? Is there even a reason to suspect there's residual
risk?

When metadata is single profile, Btrfs is basically an early warning
system. The available research on uncorrectable errors, errors that
drive ECC does not catch, suggests that users are decently likely to
experience at least one block of corruption in the life of the drive.
And that it tends to get worse up until drive failure. But there is
much less chance to detect this, if the file system isn't also
checksumming the vastly larger payload on a drive: the data.

--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-05 Thread Chris Murphy

On Fri, Jul 3, 2020 at 9:14 AM Josef Bacik  wrote:
>
> On 7/3/20 9:37 AM, Eric Sandeen wrote:

> > Does btrfsck really never attempt to salvage a metadata block with a bad 
> > CRC by
> > validating its fields?
>
> No, I suppose we could, I'll add it to the list.  Generally speaking if 
> there's
> a bad checksum detected we just attempt to recover based on what we couldn't 
> get
> access to.  However that's difficult if it's a node.  If it's a leaf then
> usually you just lose some metadata that can be inferred from other data.  For
> example if you lose a leaf in the extent tree, well we can add all that
> information back once we've scanned the rest of the file system and know what
> extents are missing in the extent tree.
>
> Same goes for directory items, we detect that we are missing directory items,
> but we have references for them and so we add the missing directory items that
> were lost from that corrupt block.
>
> But again, if you lose a node you lose access to many leaves, which makes it
> more likely we'll lose somehting because we'll lose the other information we 
> can
> use to recover what was lost.  The extent tree and checksum trees are 
> exceptions
> to this, since they can be rebuilt from scratch, provided everything else is 
> fine.
>
> And then if we did decide to validate nodes, we _might_ be ok, but we might 
> end
> up with old versions of leaves because it happens to point at something that
> appears to be correct, but isn't really.  Our metadata changes all the time, 
> so
> it's not outside the realm of possiblities that the corruption points at a
> seemlingly valid piece of metadata, but isn't and thus makes us do something
> _really_ wrong.  Thanks,


Maybe it's reasonable to expect 'btrfs check --repair' to look for
plausible alternatives when using non-crypto checksums that mismatch.
But I'm not certain it's OK when using cryptographic checksums - how
do you distinguish between incidental corruption and a malicious
attack? The repair might be the attack vector.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-04 Thread Scott Schmit

On Mon, Jun 29, 2020 at 01:33:37PM -0400, Josef Bacik wrote:
> On 6/29/20 12:23 PM, J. Bruce Fields wrote:
> > Maybe not a desktop question, but do you know btrfs's change
> > attribute/i_version status?  Does it default to bumping i_version on
> > each change, or does that still need to be opted in?  And has anyone
> > measured the performance delta (i_version vs. noi_version) recently?
> > 
> 
> Yeah it defaults to bumping it all the time, we just use the normal inode
> changing infrastructure so it gets updated the same way everybody else does.
> AFAIK there's no way to opt out of it, unless there's a -o noiversion that
> exists?

There's both an iversion and noiversion option to mount:
https://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-INDEPENDENT_MOUNT%C2%A0OPTIONS

It appears that noiversion is the default (though the man page doesn't
say so) on ext4 & btrfs, unless my experimentation below is completely
off the mark (or something changed recently that my system hasn't picked
up yet):

$ touch file
$ lsattr -v file
628580  file
## metadata-only change:
$ touch file
$ lsattr -v file
628580  file
## ^ no change...
## data change:
$ echo test > file
$ lsattr -v file
628580  file
## ^ still no change
$ rm file
$ touch file
$ lsattr -v file
628582  file
## ^ now different

-- 
Scott Schmit
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-03 Thread Eric Sandeen

On 7/3/20 1:41 PM, Chris Murphy wrote:
> SSDs can fail in weird ways. Some spew garbage as they're failing,
> some go read-only. I've seen both. I don't have stats on how common it
> is for an SSD to go read-only as it fails, but once it happens you
> cannot fsck it. It won't accept writes. If it won't mount, your only
> chance to recover data is some kind of offline scrape tool. And Btrfs
> does have a very very good scrape tool, in terms of its success rate -
> UX is scary. But that can and will improve.

Ok, you and Josef have both recommended the btrfs restore ("scrape")
tool as a next recovery step after fsck fails, and I figured we should
check that out, to see if that alleviates the concerns about
recoverability of user data in the face of corruption.

I also realized that mkfs of an image isn't representative of an SSD
system typical of Fedora laptops, so I added "-m single" to mkfs,
because this will be the mkfs.btrfs default on SSDs (right?).  Based
on Josef's description of fsck's algorithm of throwing away any
block with a bad CRC this seemed worth testing.

I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
image, or a bit less than 1% of the filesystem blocks, at random.
This is 1/4 the fuzzing rate from the original test.

So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
mount, mount w/ recovery, and then restore ("scrape") if all that
fails, see what we get.

I ran 50 loops, and got:

46 btrfsck failures
20 mount failures

So it ran btrfs restore 20 times; of those, 11 runs lost all or
substantially all of the files; 17 runs lost at least 1/3 of the 
files.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-03 Thread Chris Murphy

SSDs can fail in weird ways. Some spew garbage as they're failing,
some go read-only. I've seen both. I don't have stats on how common it
is for an SSD to go read-only as it fails, but once it happens you
cannot fsck it. It won't accept writes. If it won't mount, your only
chance to recover data is some kind of offline scrape tool. And Btrfs
does have a very very good scrape tool, in terms of its success rate -
UX is scary. But that can and will improve.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-03 Thread Josef Bacik

On 7/3/20 9:37 AM, Eric Sandeen wrote:

On 7/1/20 2:50 PM, Josef Bacik wrote:

On 7/1/20 2:24 PM, Matthew Miller wrote:

On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew Jędrzejewski-Szmek wrote:

Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34
could be good option. I know technically it is already opt-in, but it's not
very visible or popular. We could make the btrfs option more prominent and
ask people to pick it if they are ready to handle potential fallout.

I'm leaning towards recommending this as well. I feel like we don't have
good data to make a decision on -- the work that Red Hat did previously when
making a decision was 1) years ago and 2) server-focused, and the Facebook
production usage is encouraging but also not the same use case. I'm
particularly concerned about metadata corruption fragility as noted in the
Usenix paper. (It'd be nice if we could do something about that!)

There's only so much we can do about this. I've sent up patches to ignore
failed global trees to allow users to more easily recover data in case of
corruption in the case of global trees, but as they say if only 1 bit is off in
a node, we throw the whole node away. And throwing a node away means you lose
access to any of its children, which could be a large chunk of the file system.

This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this
is just the reality of using checksums. It's a checksum, not ECC. We don't know _which_
bits are fucked, we just know somethings fucked, so we throw it all away. If you have
RAID or DUP then we go read the other copy, and fix the broken copy if we find a good
copy. If we don't, well then there's nothing really we can do.

There is often a path forward when a bad metadata checksum is detected.
i.e. e2fsck:

scan_extent_node() {
...

/* Failed csum but passes checks? Ask to fix checksum. */
if (failed_csum &&
fix_problem(ctx, PR_1_EXTENT_ONLY_CSUM_INVALID, pctx)) {
pb->inode_modified = 1;
pctx->errcode = ext2fs_extent_replace(ehandle, 0, );
if (pctx->errcode)
return;
}

it does similarly for many types of metadata:

/* inode passes checks, but checksum does not match inode */
#define PR_1_INODE_ONLY_CSUM_INVALID0x010068
--
/* Inode extent block passes checks, but checksum does not match extent */
#define PR_1_EXTENT_ONLY_CSUM_INVALID 0x01006A
--
/* Inode extended attribute block passes checks, but checksum does not
* match block. */
#define PR_1_EA_BLOCK_ONLY_CSUM_INVALID 0x01006C
--
/* dir leaf node passes checks, but fails checksum */
#define PR_2_LEAF_NODE_ONLY_CSUM_INVALID0x02004D

Does btrfsck really never attempt to salvage a metadata block with a bad CRC by
validating its fields?

No, I suppose we could, I'll add it to the list. Generally speaking if there's
a bad checksum detected we just attempt to recover based on what we couldn't get
access to. However that's difficult if it's a node. If it's a leaf then
usually you just lose some metadata that can be inferred from other data. For
example if you lose a leaf in the extent tree, well we can add all that
information back once we've scanned the rest of the file system and know what
extents are missing in the extent tree.

Same goes for directory items, we detect that we are missing directory items,
but we have references for them and so we add the missing directory items that
were lost from that corrupt block.

But again, if you lose a node you lose access to many leaves, which makes it
more likely we'll lose somehting because we'll lose the other information we can
use to recover what was lost. The extent tree and checksum trees are exceptions
to this, since they can be rebuilt from scratch, provided everything else is fine.

And then if we did decide to validate nodes, we _might_ be ok, but we might end
up with old versions of leaves because it happens to point at something that
appears to be correct, but isn't really. Our metadata changes all the time, so
it's not outside the realm of possiblities that the corruption points at a
seemlingly valid piece of metadata, but isn't and thus makes us do something
_really_ wrong. Thanks,

Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-03 Thread Eric Sandeen

On 7/1/20 2:50 PM, Josef Bacik wrote:
> On 7/1/20 2:24 PM, Matthew Miller wrote:
>> On Wed, Jul 01, 2020 at 06:54:02AM +, Zbigniew Jędrzejewski-Szmek wrote:
>>> Making btrfs opt-in for F33 and (assuming the result go well) opt-out for 
>>> F34
>>> could be good option. I know technically it is already opt-in, but it's not
>>> very visible or popular. We could make the btrfs option more prominent and
>>> ask people to pick it if they are ready to handle potential fallout.
>>
>> I'm leaning towards recommending this as well. I feel like we don't have
>> good data to make a decision on -- the work that Red Hat did previously when
>> making a decision was 1) years ago and 2) server-focused, and the Facebook
>> production usage is encouraging but also not the same use case. I'm
>> particularly concerned about metadata corruption fragility as noted in the
>> Usenix paper. (It'd be nice if we could do something about that!)
>>
> 
> There's only so much we can do about this.  I've sent up patches to ignore 
> failed global trees to allow users to more easily recover data in case of 
> corruption in the case of global trees, but as they say if only 1 bit is off 
> in a node, we throw the whole node away.  And throwing a node away means you 
> lose access to any of its children, which could be a large chunk of the file 
> system.
> 
> This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but 
> this is just the reality of using checksums.  It's a checksum, not ECC.  We 
> don't know _which_ bits are fucked, we just know somethings fucked, so we 
> throw it all away.  If you have RAID or DUP then we go read the other copy, 
> and fix the broken copy if we find a good copy.  If we don't, well then 
> there's nothing really we can do.

There is often a path forward when a bad metadata checksum is detected.
i.e. e2fsck:

scan_extent_node() {
...

/* Failed csum but passes checks?  Ask to fix checksum. */
if (failed_csum &&
fix_problem(ctx, PR_1_EXTENT_ONLY_CSUM_INVALID, pctx)) {
pb->inode_modified = 1;
pctx->errcode = ext2fs_extent_replace(ehandle, 0, );
if (pctx->errcode)
return;
}

it does similarly for many types of metadata:

/* inode passes checks, but checksum does not match inode */
#define PR_1_INODE_ONLY_CSUM_INVALID0x010068
--
/* Inode extent block passes checks, but checksum does not match extent */
#define PR_1_EXTENT_ONLY_CSUM_INVALID   0x01006A
--
/* Inode extended attribute block passes checks, but checksum does not
 * match block. */
#define PR_1_EA_BLOCK_ONLY_CSUM_INVALID 0x01006C
--
/* dir leaf node passes checks, but fails checksum */
#define PR_2_LEAF_NODE_ONLY_CSUM_INVALID0x02004D

Does btrfsck really never attempt to salvage a metadata block with a bad CRC by
validating its fields?

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-03 Thread Nicolas Mailhot via devel

Le jeudi 02 juillet 2020 à 17:44 -0400, Josef Bacik a écrit :
> However just because we know something 
> went wrong doesn't mean we can do anything about it, it just means
> that the user knows now that they need to restore from backups 

That’s a perfect answer for an Enterprise server setup with systematic
backup/restore procedures.

For workstations? Even in an Enterprise context? Not so much.

Regards,

-- 
Nicolas Mailhot
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread Josef Bacik

Yeah I mean the general discussion, not you specifically.  Thanks,

Josef

On Thu, Jul 2, 2020 at 8:38 PM Eric Sandeen  wrote:

> On 7/2/20 4:44 PM, Josef Bacik wrote:
> > We're talking about this issue like it's reasonable that xfs and ext4
> are going to allow the user to get back a bunch of data they don't know is
> ok or not. We're also talking about it like the user should be able to
> carry on his happy merry way.  In these cases the drive is dying and needs
> to be shredded, and a new install needs to happen and a restore from
> backups needs to happen.  Is the btrfs failure much less user friendly?  No
> doubt about it.  Is it any comfort at all when a user shows up and we say
> "where are your backups" and they say "what backups?", no.  But if we're
> going to talk about this like ext4 and xfs are much better because they
> give you the _appearance_ that your data is fine, that's a bit disingenuous.
>
> If I had talked about it like that, it would have been disingenuous.
>
> But I didn't; this was an investigation of resiliency to metadata
> corruption, not data error detection, and to what degree metadata
> corruption can render files or even entire filesystems unreachable after
> normal administrative recovery efforts.
>
> -Eric
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/2/20 4:44 PM, Josef Bacik wrote:
> We're talking about this issue like it's reasonable that xfs and ext4 are 
> going to allow the user to get back a bunch of data they don't know is ok or 
> not. We're also talking about it like the user should be able to carry on his 
> happy merry way.  In these cases the drive is dying and needs to be shredded, 
> and a new install needs to happen and a restore from backups needs to happen. 
>  Is the btrfs failure much less user friendly?  No doubt about it.  Is it any 
> comfort at all when a user shows up and we say "where are your backups" and 
> they say "what backups?", no.  But if we're going to talk about this like 
> ext4 and xfs are much better because they give you the _appearance_ that your 
> data is fine, that's a bit disingenuous.

If I had talked about it like that, it would have been disingenuous.

But I didn't; this was an investigation of resiliency to metadata corruption, 
not data error detection, and to what degree metadata corruption can render 
files or even entire filesystems unreachable after normal administrative 
recovery efforts.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread Konstantin Kharlamov

On Thu, 2020-07-02 at 21:37 +0300, Konstantin Kharlamov wrote:
> On Thu, 2020-07-02 at 09:44 +0200, Florian Weimer wrote:
> > * Konstantin Kharlamov:
> > 
> > > FWIW, I was just thinking about it, and I came up with example you
> > > may like which shows exactly why BTRFS is bad for HDD. Consider
> > > development process. It includes rewriting source files over and
> > > over: you do `git checkout foo` and files are overwritten, you
> > > change a file in text editor, and it gets overwritten. And since
> > > BTRFS is CoW, it will always write files to a new place.
> > 
> > Editors that make a backup copy typically do not overwrite files in
> > place.  They rename the file to the backup location and then write the
> > new file.
> > 
> > git checkout unlinks changed files first, before writing them anew
> > from scratch.
> > 
> > A COW file system does not make a difference for these use cases
> > because there is already COW at the application level.
> > 
> > The GNU assembler truncates the output object file first.  On XFS,
> > that triggers relocation to a new file system location as well, even
> > if the output file size (or contents) does not change.  So that
> > scenario is essentially COW as well today.
> 
> Per my understanding what happens when you write a new file and delete an old
> one is that a block that old file was taking gets freed.
> 
> Then, if you copy the file again, file system should find a free block to
> write this copy into. And this block likely would be the one that got freed
> previously.
> 
> So, well, it is indeed COW, but not the one BTRFS does. It's a COW that copies
> a file back and forth between two blocks :) This is kinda HDD-friendly COW :)
> 
> BTRFS on the other hand will not rewrite older block unless it's out of new
> ones.

Just to clarify: I do not claim this is how ext4 or xfs works. This simplistic
explanation is just something obvious regarding how a non-COW fs would work, but
of course there can be reasons for them to behave differently. If someone knows
better, they're welcome. What I do know though, is how a COW FS works, because I
did work a little with ZFS at dayjob.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread Josef Bacik


On 7/2/20 4:38 PM, Eric Sandeen wrote:

On 7/1/20 12:50 PM, Chris Murphy wrote:

...


Integrity checking is highly valued by some and less by others.
Considering that we know hardware isn't 100% reliable, and doesn't
always report its own failures as expected, and hence why most file
systems now at least checksum metadata, it's not persuasive to me that
the data should be left unchecked, and corruption ought to be handled
by user space somehow.


There's a flip side to this coin - in my experience, if the right btrfs
metadata blocks experience this disk corruption, there can be
a complete inability to recover the btrfs filesystem from that error -
i.e. it won't mount, and btrfsck --repair won't get it to a mountable
state.

So if we're saying disk corruption happens often enough that data
checksumming is critical, then it happens often enough that metadata
recovery is at least as critical.

I've been trying to quantify this and have not come up with a particularly
compelling test scenario, because it involves purposefully (though at random)
corrupting enough blocks on a filesystem image that a critical block gets
hit, so it looks synthetic.  But the net result is frequently a filesystem
where btrfsck and/or mount fails, and at first blush this type of failure
happens much more often than on other filesystems.[1]

I think Josef has alluded to this situation as well.  To me, that's a big
concern.  Not trying to be a wet blanket here but I think this needs to be
carefully investigated and evaluated to understand what impact it may have
on Fedora btrfs users and their ability to recover their data in the face
of metadata corruption, because it looks to me like a definite btrfs weak
spot.


Yeah this is what I've said many times over the last 3 weeks.  Btrfs is more 
vulnerable to metadata corruption.


Now there's things that we can do to mitigate this.  I have one patch up to 
handle one of the main cases (a corrupt global tree).  The next patch set will 
be to keep entire metadata tree's around for longer as long as we have space to 
handle it.  These two things will drastically improve the situation, but of 
course if I'm being evil we can still end up in a bad spot.  These patches are 
not hard or controversial, they'll likely land in 5.9 which will be what F33 
ships with (if I'm doing my math right).


And this sort of ignores the other side of the coin.  fsfuzzer isn't just 
corrupting metadata, it's corrupting data.  Btrfs is the only file system that's 
going to notice that and let the user know.


Checksumming is great because it lets the user know things are going wrong 
before they go catastrophically wrong.  However just because we know something 
went wrong doesn't mean we can do anything about it, it just means that the user 
knows now that they need to restore from backups and find a new drive.  These 
features do not mean you are absolved of good practices.  If you care about 
data, you need to have it in multiple places.  End of story.  Btrfs is just 
going to let you know in advance that things are going wrong.


We're talking about this issue like it's reasonable that xfs and ext4 are going 
to allow the user to get back a bunch of data they don't know is ok or not. 
We're also talking about it like the user should be able to carry on his happy 
merry way.  In these cases the drive is dying and needs to be shredded, and a 
new install needs to happen and a restore from backups needs to happen.  Is the 
btrfs failure much less user friendly?  No doubt about it.  Is it any comfort at 
all when a user shows up and we say "where are your backups" and they say "what 
backups?", no.  But if we're going to talk about this like ext4 and xfs are much 
better because they give you the _appearance_ that your data is fine, that's a 
bit disingenuous.


"Well what if it was just /usr."  Sure, then you got lucky and you could copy 
things off.  But what if it wasn't?  That's the measure that's being applied to 
btrfs here.  Is it likely that random corruption is going to be so bad that you 
end up with an unmountable file system?  It's about as likely that the random 
corruption is on your dissertation or your family photographs.  The difference 
is that btrfs will tell you that your dissertation or your family photographs 
are now bad, whereas ext4 and xfs will not.


These are tradeoffs no doubt.  Every file system choice is a series of trade 
offs.  We're arguing/optimizing for the narrowest usecase.  Arguments can be 
made either way, but in the end is it important enough to not move ahead with 
btrfs?  Thanks,


Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/2/20 3:58 PM, José Abílio Matos wrote:
> On Thursday, 2 July 2020 21.38.46 WEST Eric Sandeen wrote:
>> 3 files in lost+found, -1 files gone/unreachable
> 
> This last line from the xfs test seems suspicious (the -1 file gone). :-)

It is weird, but it shows I didn't fudge the numbers ;)

directory repair may have inadvertently created a file or something, not sure.

-Eric
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread José Abílio Matos

On Thursday, 2 July 2020 21.38.46 WEST Eric Sandeen wrote:
> 3 files in lost+found, -1 files gone/unreachable

This last line from the xfs test seems suspicious (the -1 file gone). :-)
-- 
José Abílio

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread Roberto Ragusa


On 2020-07-01 23:04, Michael Catanzaro wrote:

On Wed, Jul 1, 2020 at 11:01 pm, Roberto Ragusa  wrote:

The real solution would be to make wise usage of LVM, for example by not
allocating 100% of the extents at the beginning (or even dm-thin) and/or
using filesystems where a shrink is supported (I'm here blaming xfs
for not having this, while ext4 has).


Leaving space unallocated doesn't gain us anything because the user still has 
to manually resize both logical volumes and the partitions inside them. Our 
default needs to be something that doesn't require users to resize partitions.


But those are things that can be done in a few seconds with one or two commands.
Attempts to make easy things easier lead to making other things difficult:
some not so inexperienced users will find themselves with their disk having 
only one
big partition, no LVM, everything inside (system+data) and trying to decipher 
the
suggestion found on a forum "with btrfs you can sort of format / without losing 
/home
even if you do not have separate partitions".

--
   Roberto Ragusamail at robertoragusa.it
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

On 7/1/20 12:50 PM, Chris Murphy wrote:

...

> Integrity checking is highly valued by some and less by others.
> Considering that we know hardware isn't 100% reliable, and doesn't
> always report its own failures as expected, and hence why most file
> systems now at least checksum metadata, it's not persuasive to me that
> the data should be left unchecked, and corruption ought to be handled
> by user space somehow.

There's a flip side to this coin - in my experience, if the right btrfs
metadata blocks experience this disk corruption, there can be
a complete inability to recover the btrfs filesystem from that error -
i.e. it won't mount, and btrfsck --repair won't get it to a mountable
state.

So if we're saying disk corruption happens often enough that data
checksumming is critical, then it happens often enough that metadata
recovery is at least as critical.

I've been trying to quantify this and have not come up with a particularly
compelling test scenario, because it involves purposefully (though at random)
corrupting enough blocks on a filesystem image that a critical block gets
hit, so it looks synthetic.  But the net result is frequently a filesystem
where btrfsck and/or mount fails, and at first blush this type of failure
happens much more often than on other filesystems.[1]

I think Josef has alluded to this situation as well.  To me, that's a big
concern.  Not trying to be a wet blanket here but I think this needs to be
carefully investigated and evaluated to understand what impact it may have
on Fedora btrfs users and their ability to recover their data in the face
of metadata corruption, because it looks to me like a definite btrfs weak
spot.

-Eric

[1] some details - I used the mangle.c fuzzer from fsfuzzer, and modified
it so that it corrupts 8192 bytes of an image, which in fs terms
can be up to 8192 filesystem blocks.  I also avoided the first 4k so that
any filesystem signature was not damaged.

I then ran a loop where I created a 1G base image, populated it, fuzzed it
in this way, (so up to 3% of blocks were damaged) and ran the filesystem's
fsck utility  (in btrfs' case, btrfsck --repair) and then tried to mount
(in btrfs' case, with bare mount, then -o usebackuproot if mount failed). 
If it mounted, I used "find | wc" to see how many files were reachable vs
the original image.

If either fsck or mount reports an exit code that reflects failure to
complete properly, I recorded that.

It was a quick hack, and it's not beautiful, so there are probably holes
to be poked in it; if you want to look, I threw the bash script and the C
source up at https://people.redhat.com/esandeen/fsckfuzzer/

Running 10 loops on each of btrfs, ext4, and xfs I got results that look
like this (ext4 always creates empty lost+found so it will always find at
least 1 file there)

btrfs

fsck failed
0 files in lost+found, 628 files gone/unreachable
0 files in lost+found, 0 files gone/unreachable
526 files in lost+found, 9 files gone/unreachable
595 files in lost+found, 55 files gone/unreachable
53 files in lost+found, 8 files gone/unreachable
57 files in lost+found, 44 files gone/unreachable
fsck failed
7 files in lost+found, 1491 files gone/unreachable
fsck failed, mount failed
fsck failed, mount failed
88 files in lost+found, 40 files gone/unreachable
== 4 fsck failures, 2 mount failures

ext4

1 files in lost+found, 0 files gone/unreachable
1 files in lost+found, 0 files gone/unreachable
164 files in lost+found, 2 files gone/unreachable
1 files in lost+found, 0 files gone/unreachable
1 files in lost+found, 0 files gone/unreachable
1 files in lost+found, 1 files gone/unreachable
1 files in lost+found, 0 files gone/unreachable
9 files in lost+found, 1 files gone/unreachable
1 files in lost+found, 0 files gone/unreachable
1 files in lost+found, 0 files gone/unreachable
== 0 fsck failures, 0 mount failures

xfs

0 files in lost+found, 1 files gone/unreachable
0 files in lost+found, 0 files gone/unreachable
958 files in lost+found, 629 files gone/unreachable
0 files in lost+found, 0 files gone/unreachable
2 files in lost+found, 0 files gone/unreachable
0 files in lost+found, 1 files gone/unreachable
0 files in lost+found, 0 files gone/unreachable
0 files in lost+found, 0 files gone/unreachable
8 files in lost+found, 1 files gone/unreachable
3 files in lost+found, -1 files gone/unreachable
== 0 fsck failures, 0 mount failures

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread Josef Bacik


On 7/1/20 9:49 PM, Chris Adams wrote:

Once upon a time, Josef Bacik  said:

This sounds like a "wtf, why are you doing this btrfs?" sort of
thing, but this is just the reality of using checksums.  It's a
checksum, not ECC.  We don't know _which_ bits are fucked, we just
know somethings fucked, so we throw it all away.  If you have RAID
or DUP then we go read the other copy, and fix the broken copy if we
find a good copy.  If we don't, well then there's nothing really we
can do.


That's where an fsck and a lost+found type directory should come into
play.  Maybe punt to user space, but still try to see what you can make
sense of to try to salvage.  If you are saying a single bit error in the
wrong place can basically lop off a good chunk of a filesystem, then I'm
going to say that's not an improvement in reliability.



We do, the recovery tools allow you to just ignore checksums.  This is 
specifically separate from everything else because there's the expectation of 
results.  The user is acknowledging that things are bad and the tools are going 
to do their very best.  If you know you only have a single bit off then hooray, 
you got everything back (probably), but if not then you don't.  Thanks,


Josef
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

2020-07-02 Thread Konstantin Kharlamov

On Thu, 2020-07-02 at 09:44 +0200, Florian Weimer wrote:
> * Konstantin Kharlamov:
> 
> > FWIW, I was just thinking about it, and I came up with example you
> > may like which shows exactly why BTRFS is bad for HDD. Consider
> > development process. It includes rewriting source files over and
> > over: you do `git checkout foo` and files are overwritten, you
> > change a file in text editor, and it gets overwritten. And since
> > BTRFS is CoW, it will always write files to a new place.
> 
> Editors that make a backup copy typically do not overwrite files in
> place.  They rename the file to the backup location and then write the
> new file.
> 
> git checkout unlinks changed files first, before writing them anew
> from scratch.
> 
> A COW file system does not make a difference for these use cases
> because there is already COW at the application level.
> 
> The GNU assembler truncates the output object file first.  On XFS,
> that triggers relocation to a new file system location as well, even
> if the output file size (or contents) does not change.  So that
> scenario is essentially COW as well today.

Per my understanding what happens when you write a new file and delete an old 
one is that a block that old file was taking gets freed.

Then, if you copy the file again, file system should find a free block to write 
this copy into. And this block likely would be the one that got freed 
previously.

So, well, it is indeed COW, but not the one BTRFS does. It's a COW that copies 
a file back and forth between two blocks :) This is kinda HDD-friendly COW :)

BTRFS on the other hand will not rewrite older block unless it's out of new 
ones.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants