Austin S Hemmelgarn posted on Fri, 04 Apr 2014 08:33:10 -0400 as
excerpted:

> On 2014-04-04 04:02, Swâmi Petaramesh wrote:
>> Hi,
>> 
>> I'm going to receive a new small laptop with a 500 GB 5400 RPM
>> mechanical "ole' rust"  HD, and I plan ton install BTRFS on it.

Reminds me of my query to the list, some months ago.  (Altho I was/am 
using dual 238 GiB SSDs, in btrfs raid1 mode both data and metadata, in a 
desktop, additionally with a 500 gig spinning rust drive for media that 
is still running reiserfs, so the details are somewhat different.)

>> It will have a kernel 3.13 for now, until 3.14 gets released.

$ uname -r
3.14.0

=:^)

But it's good you (SP) keep reasonably current.  I see people posting 
with old 2.6.* kernels and wonder why they're even bothering with btrfs, 
since they obviously aren't current, kernel-wise.

>> However I'm still concerned with chronic BTRFS dreadful performance and
>> still find that BRTFS degrades much over time even with periodic defrag
>> and "best practices" etc.

> I keep hearing this from people, but i personally don't see this to be
> the case at all.  I'm pretty sure the 'big' performance degradation that
> people are seeing is due to how they are using snapshots, not a result
> using BTRFS itself (I don't use them for anything other than ensuring a
> stable system image for rsync and/or tar based backups).

I'll second what you (AH) and Hugo say elsewhere, and I've written some 
on the subject in other threads too.  Snapshots per se aren't bad, but 
there's really no reason to have thousands of them against the same base 
subvolume -- in practice, if you need to mount a snapshot a month or six 
old, are you really going to know or care what exact minute to mount?

While I /personally/ think per-minute snapshots are overdoing it, per 
hour or so is definitely logically supportable and if you /want/ per-
minute, well, fine.  But per-minute or per-hour or per-day, or just 
taking an occasional manual snapshot, /do/ strongly consider thinning 
them out on a reasonable schedule, and the more frequently you take 'em 
the more you need to thin.  So if for example you're taking per-minute, 
thin them down to perhaps one per half-hour after six hours and one per 
hour after a day, then to one a day after a week and one a week after 
four weeks.  At some point between a month and a quarter, external 
backups should have taken over, and deleting older snapshots or only 
keeping perhaps one every 13 weeks (quarter) should suffice.

Meanwhile, as Hugo hints there are still known issues with snapshots and 
large (half-gig-plus) frequently internally rewritten files such as VM 
images, databases, etc, even if set NOCOW.  If you're running something 
like this, strongly consider putting those files on a dedicated subvolume 
and using conventional backups instead of snapshotting for that 
subvolume.  (And set NOCOW using the directory inheritance mechanism 
described in other posts.)

For smaller stuff the autodefrag option should help.


>> So I'd like to start with the best possible options and have a few
>> questions :
>> 
>> - Is it still recommended to mkfs with a nodesize or leafsize different
>> (bigger) than the default ? I wouldn't like to lose too much disk space
>> anyway (1/2 nodesize per file on average ?), as it will be limited...

> This depends on many things, the average size of the files on the disk
> is the biggest factor.  In general you should get the best disk
> utilization [snip]

As Hugo says, btrfs' current nodesize settings, etc, apply to metadata, 
not data, which is currently the standard 4K page-size on x86.  Metadata 
nodesize now defaults to 16K with newer mkfs.btrfs, which should be 
reasonable.  (There's work to make the data-block size configurable as 
well, in part because it's currently not possible to mount btrfs created 
on architectures with different page sizes, tho luckily both arm and x86/
amd64 have 4k page sizes so are compatible.)

>> - Is it recommended to alter the FS to have "skinny extents" ? I've
>> done this on all of my BTRFS machines without problem, still the kernel
>> spits a notice at mount time, and I'm worrying kind of "Why is the
>> kernel warning me I have skinny extents ? Is it bad ? Is it something I
>> should avoid ?"

> I think that the primary reason for the warning is that it is backward
> incompatible, older kernels can't mount filesystems using it.

Agreed.  When skinny extents first came out there were some initial bugs, 
but I believe they've been worked out by now in general, so it shouldn't 
be a problem.  The big remaining issue is backward compatibility.

Tho at least here (where I've been running 3.14 pre-releases since before 
rc1), the on-mount skinny-extents comment seems more informational than 
actual warning.

That said, more conservative users might wish to stay with "fat" extents, 
since AFAIK that's still the default, so it's going to get the most 
testing.  FWIW, when I last re-did my partitions in ordered to take 
advantage of the 16k metadata node-sizes, etc (late kernel 3.13 cycle I 
think), I kept fat extents on root and home, but went with skinny extents 
on my packages partition.  I've seen no issues with it in my usage, and 
will probably go all skinny-extent the next time I redo my partitions.

>> - Are there other optimization tricks I should perform at mkfs time
>> because thay can't be changed later on ?

I used -O extref on all my partitions here, when I redid them.  That's 
probably a good idea.

The -m (mixed data/metadata) thing is interesting.  You probably don't 
want to do it on a 500 gig unless you partition up (tho some do for the 
dup mode benefit mentioned below), and it's the default on really small 
(gig and smaller) partitions, but some people use it on filesystems up to 
128 gig or so for a couple reasons.

Mixed mode does help avoid the issue of having to run a balance if data 
(typical) or metadata chunk allocations end up using all available space, 
since to the present, btrfs can automatically allocate new chunks of one 
or the other if there's unallocated chunks available, but can't 
reallocate empty chunks from one to the other if necessary, without a 
rebalance.  Mixed mode eliminates having to do a manual rebalance to 
return chunks to the unallocated pool so they can be used for the other 
type, since all chunks can then be used for data and metadata, both.  But 
it DOES have a bit of a performance impact.

The other and arguably more interesting feature of mixed mode for single 
device filesystems is that it allows and in fact defaults to dup profile 
mode for the now mixed data/metadata chunks, inheriting that default (as 
well as the 256 MiB chunk size) from the metadata side.  Since unlike 
metadata, data chunks are otherwise limited to single profile mode, mixed-
mode is the only way (other than creating two partitions on the same 
hardware device and running btrfs raid1 on that, but that's less 
efficient, particularly on spinning rust) to fully apply btrfs data 
integrity benefits to data chunks on a single device.  Normally, in case 
of corruption btrfs scrub on a single device filesystem can only recover 
metadata, since those are the only chunks in dup mode.  But with mixed-
mode, data and metadata share the same chunks and thus can both be dup, 
thus allowing data to be recovered from the other copy if one copy goes 
bad, as well as metadata.

To someone like me where a big reason for using btrfs at all is the data 
integrity aspect, thus my running two SSDs configured in btrfs raid1 mode 
for most partitions, if I were limited to a single hardware device (as I 
will be for my netbook, tho I've not actually converted it to btrfs yet), 
I may well consider mixed-mode, for the benefit of dup-mode data as well 
as metadata, alone!

Tho of course that does effectively limit you to half capacity, since all 
data and metadata is duplicated.  And on spinning rust it's going to be a 
performance issue, tho it should be less of one doing it that way than it 
would be forcing it with two identical partitions on the same hardware 
disk and setting btrfs up in raid1 mode.

But if you /do/ use mixed-mode, as I implied above, you may wish to break 
up that 500 gig into multiple 128 gig or so partitions, each with its own 
btrfs, as I believe your performance cost will be lower that way, than 
they'd be with a single 500 gig mixed-mode single-device btrfs.  But do 
remember when you'r setting up the partitions that dup mode does mean 
they get full with half the stuff they'd normally hold, and size the 
partitions accordingly!

>> - Are there other btrfstune or mount options I should pass before
>> starting to populate the FS with a system and data ?
> Unless you are using stuff like QEMU or Virtualbox, you should probably
> have autodefrag and space_cache on from the very start.

Agreed in general.  However, my experience is that space_cache is now the 
default, so you don't have to set that explicitly.

As for autodefrag, definitely strongly recommended, /except/ as mentioned 
for large (half-gig or larger) frequent-internal-rewrite files such as VM 
images and databases.  For large internal-write files I'd recommend 
putting them on their own dedicated subvolume (or fully separate 
partition) to avoid snapshotting, and setting up NOCOW for the affected 
directories.  (At some point individual subvolumes will be mountable with 
different options and the entire dedicated subvolume could then be 
mounted with nodatacow.  But AFAIK, that doesn't work yet and the 
nodatacow would apply to all subvolumes on that filesystem, not a good 
idea.  So for now, NOCOW at the directory and file level and dedicated 
subvolume only to prevent snapshotting the NOCOW files, will have to do.)

Also noatime.  That's not btrfs specific, but especially if you're doing 
snapshots it has stronger implications on btrfs than other filesystems.  
Consider, if there hasn't been a whole lot of write activity between 
snapshots, atime updates can be a big part of the difference between one 
snapshot and the last, thus making snapshots far less space efficient 
than they might otherwise be.  So while noatime is always a good option 
to enable unless you're running something (like mutt) that really needs 
them, it's REALLY a good option to enable on btrfs if you're doing 
snapshotting at all.

>> - Generally speaking, does LZO compression improve or degrade
>> performance ? I'm not able to figure it out clearly.

> As long as your memory bandwidth is significantly higher than disk
> bandwidth (which is almost always the case, even with SSD's), this
> should provide at least some improvement with respect to IO involving
> large files.  Because you are using a traditional hard disk instead of
> an SSD, you might get better performance using zlib (assuming you don't
> mind slightly higer processor usage for IO to files larger than the
> leafsize).  If you care less about disk utilization than you do about
> performance, you might want to use compress_force instead of compress,
> as the performance boost comes from not having to write as much data to
> disk.

Agreed.  I'm using compress=lzo here, even on ssd.  I'd probably use zlib 
on spinning rust, and would then experiment with compress-force as well.

The other thing about compress, on a standard single-device filesystem 
with default dup metadata and default single data, is that when I tried 
it here at least (before I got the ssds and went raid1 mode), compress=lzo 
rather nicely offset (and then some, for my use-case) the extra space 
required by the duplicate metadata.

Come to think of it, depending on the compressibility of your data, 
compress=zlib (or possibly compress-force=zlib) might offset much of the 
duplicate space required for mixed-mode dup as well, thereby making it 
more practical.  Since on spinning rust the compression is also likely to 
offset to some degree the slowness of the spinning rust, that might be 
quite a reasonable tradeoff (tho write speeds would still likely be 
noticeably slower than single-data mode due to having to write out both 
copies).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to