Thank you all for the quick response. The server is running, but as I
said the i/o perf. is not as good as it should be. I'm also thinking
the fragmentation is the issue but I also would like to optimise my
config and if possible keep this server running with acceptable
performance, so let me answer the questions below

So, as far as I see the action plan is the following
- enable v2 space_cache. is this safe/stable enough?
- run defrag on old data, I suppose it will run weeks, but I'm ok with
it if the server can run smoothly during this process
- compress=zstd is the recommended mount option? is this performing
better than the default?
- I'm also thinking to -after defrag- compress my logs with
traditional gzip compression and turn off on-the-fly compress (is this
a huge performance gain?)

Any other suggestions?

Thank you
Laszlo
---

uname -a
3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64
x86_64 x86_64 GNU/Linux

  btrfs --version
  btrfs-progs v4.9.1

  btrfs fi show
  Label: 'centos'  uuid: 7017204b-1582-4b4e-ad04-9e55212c7d46
Total devices 2 FS bytes used 4.03TiB
devid    1 size 491.12GiB used 119.02GiB path /dev/sda2
devid    2 size 4.50TiB used 4.14TiB path /dev/sdb1

  btrfs fi df
  btrfs fi df /var
Data, single: total=4.09TiB, used=3.96TiB
System, RAID1: total=8.00MiB, used=464.00KiB
Metadata, RAID1: total=81.00GiB, used=75.17GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

  dmesg > dmesg.log
  dmesg|grep -i btrfs
  [  491.729364] BTRFS warning (device sdb1): block group
4619266686976 has wrong amount of free space
  [  491.729371] BTRFS warning (device sdb1): failed to load free
space cache for block group 4619266686976, rebuilding it now

  CPU type and model
  processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
stepping : 4
microcode : 0x1d
cpu MHz : 2533.423
cache size : 8192 KB
12 vCPU on esxi

how much memory
48 GB RAM

type and model of hard disk
virtualized Fujitsu RAID on esxi

is it raid
yes, the underlying virtualization provides redundancy, no sw RAID

Kernel version
3.10.0-1160.6.1.el7.x86_64

your btrfs mount options probably in /etc/fstab
UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /
btrfs   defaults,noatime,autodefrag,subvol=root     0 0
UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /var
btrfs   defaults,subvol=var,noatime,autodefrag      0 0

size of log files
4,5TB on /var

have you snapshots
no

have you tries tools like dedup remover
not yet

things you do

1. Kernel update LTS kernel has been updated to 5.10 (maybe you have
to install it manually, because centos will be dropped -> reboot
maybe you have to remove your mount point in fstab and boot into
system and mount it later manually.
Is this absolutely necessary?

2. set mount options in fstab
    defaults,autodefrag,space_cache=v2,compress=zstd (autodefrag only on HDD)
    defaults,ssd,space_cache=v2,compress=zstd (for ssd)

  autodefrag is already enabled. v2 space_cache is safe enough?

3. sudo btrfs scrub start /dev/sda (use your device)
    watch sudo btrfs scrub status /dev/sda (watch and wait until finished)

4. sudo btrfs device stats /dev/sda (your disk)

5.install smartmontools
   run sudo smartctl -x /dev/sda (use your disk)
   check
I think this is not applicable because this is a virtual disk,

On Tue, Feb 16, 2021 at 8:17 AM Nikolay Borisov <nbori...@suse.com> wrote:
>
>
>
> On 15.02.21 г. 16:53 ч., Pal, Laszlo wrote:
> > Hi,
> >
> > I'm not sure this is the right place to ask, but let me try :) I have
> > a server where I mainly using btrfs because of the builtin compress
> > feature. This is a central log server, storing logs from tens of
> > thousands devices, using a text files in thousands of directories in
> > millions of files.
> >
> > I've started to think it was not the best idea to choose btrfs for this :)
> >
> > The performance of this server was always worst than others where I
> > don't use btrfs, but I thought this is just because the i/o overhead
> > of compression and the not-so-good esx host providing the disk to this
> > machine. But now, even rm a single file takes ages, so there is
> > something definitely wrong. So, I'm looking for some recommendations
> > for such an environment where the data-security functions of btrfs is
> > not as important than the performance.
> >
> > I was searching the net for some comprehensive performance documents
> > for months, but I cannot find it so far.
> >
> > Thank you in advance
> > Laszlo
> >
>
> You are likely suffering fragmentation issues, given you hold log files
> I'd assume you do a lot of small writes, each one results in a CoW
> operation which allocates space.  This results in increasing the size of
> the metadata tree and since you are likely using harddrives seeking is
> slow. To try and ascertain if that's really the case I'd advise you to
> show the output of the following commands:
>
> btrfs fi usage <mountpoint> - this will show the total used space on the
> filesystem.
>
> Then run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep -c
> EXTENT_DATA
>
> Which will show how many data extents there are in the filesystem.
> Subsequently run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep
> -c leaf which will show how many leaves there are in the filesystem.
> Then you have 2 options:
>
> a) Use btrfs defragment to actually rewrite leaves to try and make them
> be closer so that seeks are going to become somewhat cheaper,
>
> b) Rewrite the logs files by copying them with no reflinks so that
> instead of 1 file consisting of multiple small extents just make them
> consist of 1 giant extent, also with your use case I'd assume you also
> want nocow to be enabled, unfortunately nodatacow precludes using
> compression.
>
>

Reply via email to