On 6/27/20 2:57 AM, Nicolas Mailhot via devel wrote:
Le vendredi 26 juin 2020 à 12:30 -0400, Josef Bacik a écrit :
On 6/26/20 11:15 AM, Matthew Miller wrote:
On Fri, Jun 26, 2020 at 11:13:39AM -0400, Josef Bacik wrote:
Not Fedora land, but Facebook installs it on all of our root
devices, so millions of machines.  We've done this for 5 years.
It's worked out very well. Thanks,

Josef, I'd love to hear your comments on any differences between
situation and the typical laptop-user case for Fedora desktop
Anything we should consider?

We buy worse hardware than a typical laptop user uses, at least for
our hard drives.

The difference between an operation like Facebook and the Fedora user
base, it that Facebook will have a huge fleet of crap hardware, with
the support teams to baby-sit the crap hardware, and attention to
reducing the variety of crap hardware to limit the support matrix
breadth, while Fedora has to deal with a huge support matrix breadth,
without the support teams and the support team tooling to baby-sit
hardware. (Besides Facebook designs the levels of crapiness they allow
in their hardware, meaning they know exactly where they are pushing
limits to lower hardware costs).

And, it’s not always the crap hardware that hits bugs. Sometimes
expensive gamer hardware will fail first because its manufacturer has
pushed the limits to eke some performance points over the competition.

Therefore, using btrfs in Fedora, is inherently more ambitious, than
using it at Facebook.

I've been very clear from the outset that Facebook's fault tolerance is much higher than the average Fedora user. The only reason I've agreed to assist in answering questions and support this proposal is because I have multi-year data that shows our failure rates are the same that we see on every other file system, which is basically the failure rate of the disks themselves.

And I specifically point out the hardware that we use that most closely reflects the drives that an average Fedora user is going to have. We of course have a very wide variety of hardware. In fact the very first thing we deployed on were these expensive hardware RAID setups. Btrfs found bugs in that firmware that was silently corrupting data. These corruptions had been corrupting AI test data for years under XFS, and Btrfs found it in a matter of days because of our checksumming.

We use all sorts of hardware, and have all sorts of similar stories like this. I agree that the hardware is going to be muuuuuch more varied with Fedora users, and that Facebook has muuuuch higher fault tolerance. But higher production failures inside FB means more engineering time spent dealing with those failures, which translates to lost productivity. If btrfs was causing us to run around fixing it all the time then we wouldn't deploy it. The fact is that it's not, it's perfectly stable from our perspective. Thanks,

