Re: Btrfs + compression = slow performance and high cpu usage

2017-08-31 Thread Konstantin V. Gavrilenko
Hello again list. I thought I would clear the things out and describe what is 
happening with my troubled RAID setup.

So having received the help from the list, I've initially run the full 
defragmentation of all the data and recompressed everything with zlib. 
That didn't help. Then I run the full rebalance of the data and that didn't 
help either.

So I had to take a disk out of the raid, copy all the data onto it, recreate 
the RAID drive with 32kb chunk size and 96kb stripe and copied the data back. 
Then added the disk back and resynced the raid.


So currently the RAID device is 

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name:
RAID Level  : Primary-5, Secondary-0, RAID Level Qualifier-3
Size: 21.830 TB
Sector Size : 512
Is VD emulated  : Yes
Parity Size : 7.276 TB
State   : Optimal
Strip Size  : 32 KB
Number Of Drives: 4
Span Depth  : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: No


It is about 40% full with compressed data
# btrfs fi usage /mnt/arh-backup1/
Overall:
Device size:  21.83TiB
Device allocated:  8.98TiB
Device unallocated:   12.85TiB
Device missing:  0.00B
Used:  8.98TiB
Free (estimated): 12.85TiB  (min: 6.43TiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)


I've decided to run a set of test, where 5 gb file was created using different 
blocksizes and different flags.
one file with urandom data was generated and another one filled with zeroes. 
the data was written with compression and without compression, and it seems 
that without compression it is possible to gain 30-40% speed, while the cpu was 
running at 50% idle during the highest loads.
dd write speeds (mb/s)

flags: conv=fsync
compress-force=zlib  compress-force=none
 RAND ZERORAND ZERO
bs1024k  387  407 584  577
bs512k   389  414 532  547
bs256k   412  409 558  585
bs128k   412  403 572  583
bs64k409  419 563  574
bs32k407  404 569  572


flags: oflag=sync
compress-force=zlib  compress-force=none
 RAND  ZERORAND  ZERO
bs1024k  86.1  97.0203   210
bs512k   50.6  64.485.0  170
bs256k   25.0  29.867.6  67.5
bs128k   13.2  16.448.4  49.8
bs64k7.4   8.3 24.5  27.9
bs32k3.8   4.1 14.0  13.7




flags: no flags
compress-force=zlib  compress-force=none
 RAND  ZERORAND  ZERO
bs1024k  480   419 681   595
bs512k   422   412 633   585
bs256k   413   384 707   712
bs128k   414   387 695   704
bs64k482   467 622   587
bs32k416   412 610   598


I have also run a test where I filled the array to about 97% capacity and the 
write speed went down by about 50% compared with the empty RAID.


thanks for the help. 

- Original Message -
From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk>
To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org>
Sent: Tuesday, 1 August, 2017 10:09:03 PM
Subject: Re: Btrfs + compression = slow performance and high cpu usage

>> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe
>> size". [ ... ] several back-to-back 128KiB writes [ ... ] get
>> merged by the 3ware firmware only if it has a persistent
>> cache, and maybe your 3ware does not have one,

> KOS: No I don't have persistent cache. Only the 512 Mb cache
> on board of a controller, that is BBU.

If it is a persistent cache, that can be battery-backed (as I
wrote, but it seems that you don't have too much time to read
replies) then the size of the write, 128KiB or not, should not
matter much; the write will be reported complete when it hits
the persistent cache (whichever technology it used), and then
the HA fimware will spill write cached data to the disks using
the optimal operation width.

Unless the 3ware firmware is really terrible (and depending on
model and vintage it can be amazingly terrible) or the battery
is no longer recharging and then the host adapter switches to
write-through.

That you see very different rates between uncompressed and
compressed writes, where the main difference is the limitation
on the segment size, seems to indicate that compressed writes
involve a lot of RMW, that is sub-stripe updates. As I mentioned
already, it would be interesting to retry 'dd' with different
'bs' values without compression and with 'sync' (or 'direct'
which only makes sense without compression).

> If I had additional SSD caching o

Re: Btrfs + compression = slow performance and high cpu usage

2017-08-01 Thread Peter Grandi
[ ... ]

> This is the "storage for beginners" version, what happens in
> practice however depends a lot on specific workload profile
> (typical read/write size and latencies and rates), caching and
> queueing algorithms in both Linux and the HA firmware.

To add a bit of slightly more advanced discussion, the main
reason for larger strips ("chunk size) is to avoid the huge
latencies of disk rotation using unsynchronized disk drives, as
detailed here:

  http://www.sabi.co.uk/blog/12-thr.html?120310#120310

That relates weakly to Btrfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs + compression = slow performance and high cpu usage

2017-08-01 Thread Peter Grandi
>> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe
>> size". [ ... ] several back-to-back 128KiB writes [ ... ] get
>> merged by the 3ware firmware only if it has a persistent
>> cache, and maybe your 3ware does not have one,

> KOS: No I don't have persistent cache. Only the 512 Mb cache
> on board of a controller, that is BBU.

If it is a persistent cache, that can be battery-backed (as I
wrote, but it seems that you don't have too much time to read
replies) then the size of the write, 128KiB or not, should not
matter much; the write will be reported complete when it hits
the persistent cache (whichever technology it used), and then
the HA fimware will spill write cached data to the disks using
the optimal operation width.

Unless the 3ware firmware is really terrible (and depending on
model and vintage it can be amazingly terrible) or the battery
is no longer recharging and then the host adapter switches to
write-through.

That you see very different rates between uncompressed and
compressed writes, where the main difference is the limitation
on the segment size, seems to indicate that compressed writes
involve a lot of RMW, that is sub-stripe updates. As I mentioned
already, it would be interesting to retry 'dd' with different
'bs' values without compression and with 'sync' (or 'direct'
which only makes sense without compression).

> If I had additional SSD caching on the controller I would have
> mentioned it.

So far you had not mentioned the presence of BBU cache either,
which is equivalent, even if in one of your previous message
(which I try to read carefully) there were these lines:

 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad 
 BBU
 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad 
 BBU

So perhaps someone else would have checked long ago the status
of the BBU and whether the "No Write Cache if Bad BBU" case has
happened. If the BBU is still working and the policy is still
"WriteBack" then things are stranger still.

> I was also under impression, that in a situation where mostly
> extra large files will be stored on the massive, the bigger
> strip size would indeed increase the speed, thus I went with
> with the 256 Kb strip size.

That runs counter to this simple story: suppose a program is
doing 64KiB IO:

* For *reads*, there are 4 data drives and the strip size is
  16KiB: the 64KiB will be read in parallel on 4 drives. If the
  strip size is 256KiB then the 64KiB will be read sequentially
  from just one disk, and 4 successive reads will be read
  sequentially from the same drive.

* For *writes* on a parity RAID like RAID5 things are much, much
  more extreme: the 64KiB will be written with 16KiB strips on a
  5-wide RAID5 set in parallel to 5 drives, with 4 stripes being
  updated with RMW. But with 256KiB strips it will partially
  update 5 drives, because the stripe is 1024+256KiB, and it
  needs to do RMW, and four successive 64KiB drives will need to
  do that too, even if only one drive is updated. Usually for
  RAID5 there is an optimization that means that only the
  specific target drive and the parity drives(s) need RMW, but
  it is still very expensive.

This is the "storage for beginners" version, what happens in
practice however depends a lot on specific workload profile
(typical read/write size and latencies and rates), caching and
queueing algorithms in both Linux and the HA firmware.

> Would I be correct in assuming that the RAID strip size of 128
> Kb will be a better choice if one plans to use the BTRFS with
> compression?

That would need to be tested, because of "depends a lot on
specific workload profile, caching and queueing algorithms", but
my expectation is the the lower the better. Given that you have
4 drives giving a 3+1 RAID set, perhaps a 32KiB or 64KiB strip
size, given a data stripe size of 96KiB or 192KiB, would be
better.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs + compression = slow performance and high cpu usage

2017-08-01 Thread Konstantin V. Gavrilenko
- Original Message -
From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk>
To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org>
Sent: Tuesday, 1 August, 2017 3:14:07 PM
Subject: Re: Btrfs + compression = slow performance and high cpu usage

> Peter, I don't think the filefrag is showing the correct
> fragmentation status of the file when the compression is used.



As I wrote, "their size is just limited by the compression code"
which results in "128KiB writes". On a "fresh empty Btrfs volume"
the compressed extents limited to 128KiB also happen to be pretty
physically contiguous, but on a more fragmented free space list
they can be more scattered.

KOS: Ok, thanks for pointing it out. I have compared the filefrag -v on another 
btrfs  that is not fragmented
and see the difference with what is happening on the sluggish one.

5824:   186368..  186399: 2430093383..2430093414: 32: 2430093414: encoded
5825:   186400..  186431: 2430093384..2430093415: 32: 2430093415: encoded
5826:   186432..  186463: 2430093385..2430093416: 32: 2430093416: encoded
5827:   186464..  186495: 2430093386..2430093417: 32: 2430093417: encoded
5828:   186496..  186527: 2430093387..2430093418: 32: 2430093418: encoded
5829:   186528..  186559: 2430093388..2430093419: 32: 2430093419: encoded
5830:   186560..  186591: 2430093389..2430093420: 32: 2430093420: encoded



As I already wrote the main issue here seems to be that we are
talking about a "RAID5 with 128KiB writes and a 768KiB stripe
size". On MD RAID5 the slowdown because of RMW seems only to be
around 30-40%, but it looks like that several back-to-back 128KiB
writes get merged by the Linux IO subsystem (not sure whether
that's thoroughly legal), and perhaps they get merged by the 3ware
firmware only if it has a persistent cache, and maybe your 3ware
does not have one, but you have kept your counsel as to that.


KOS: No I don't have persistent cache. Only the 512 Mb cache on board of a 
controller, that is 
BBU. If I had additional SSD caching on the controller I would have mentioned 
it.

I was also under impression, that in a situation where mostly extra large files 
will be stored on the massive, the bigger strip size would indeed increase the 
speed, thus I went with with the 256 Kb strip size.  Would I be correct in 
assuming that the RAID strip size of 128 Kb will be a better choice if one 
plans to use the BTRFS with compression?

thanks,
kos




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs + compression = slow performance and high cpu usage

2017-08-01 Thread Peter Grandi
> Peter, I don't think the filefrag is showing the correct
> fragmentation status of the file when the compression is used.

As reported on a previous message the output of 'filefrag -v'
which can be used to see what is going on:

 filefrag /mnt/sde3/testfile 
   /mnt/sde3/testfile: 49287 extents found

 Most the latter extents are mercifully rather contiguous, their
 size is just limited by the compression code, here is an extract
 from 'filefrag -v' from around the middle:

   24757:  1321888.. 1321919:   11339579..  11339610: 32:   11339594:
   24758:  1321920.. 1321951:   11339597..  11339628: 32:   11339611:
   24759:  1321952.. 1321983:   11339615..  11339646: 32:   11339629:
   24760:  1321984.. 1322015:   11339632..  11339663: 32:   11339647:
   24761:  1322016.. 1322047:   11339649..  11339680: 32:   11339664:
   24762:  1322048.. 1322079:   11339667..  11339698: 32:   11339681:
   24763:  1322080.. 1322111:   11339686..  11339717: 32:   11339699:
   24764:  1322112.. 1322143:   11339703..  11339734: 32:   11339718:
   24765:  1322144.. 1322175:   11339720..  11339751: 32:   11339735:
   24766:  1322176.. 1322207:   11339737..  11339768: 32:   11339752:
   24767:  1322208.. 1322239:   11339754..  11339785: 32:   11339769:
   24768:  1322240.. 1322271:   11339771..  11339802: 32:   11339786:
   24769:  1322272.. 1322303:   11339789..  11339820: 32:   11339803:

 But again this is on a fresh empty Btrfs volume.

As I wrote, "their size is just limited by the compression code"
which results in "128KiB writes". On a "fresh empty Btrfs volume"
the compressed extents limited to 128KiB also happen to be pretty
physically contiguous, but on a more fragmented free space list
they can be more scattered.

As I already wrote the main issue here seems to be that we are
talking about a "RAID5 with 128KiB writes and a 768KiB stripe
size". On MD RAID5 the slowdown because of RMW seems only to be
around 30-40%, but it looks like that several back-to-back 128KiB
writes get merged by the Linux IO subsystem (not sure whether
that's thoroughly legal), and perhaps they get merged by the 3ware
firmware only if it has a persistent cache, and maybe your 3ware
does not have one, but you have kept your counsel as to that.

My impression is that you read the Btrfs documentation and my
replies with a lot less attention than I write them. Some of the
things you have done and said make me think that you did not read
https://btrfs.wiki.kernel.org/index.php/Compression and 'man 5
btrfs', for example:

   "How does compression interact with direct IO or COW?

 Compression does not work with DIO, does work with COW and
 does not work for NOCOW files. If a file is opened in DIO
 mode, it will fall back to buffered IO.

   Are there speed penalties when doing random access to a
   compressed file?

 Yes. The compression processes ranges of a file of maximum
 size 128 KiB and compresses each 4 KiB (or page-sized) block
 separately."

> I am currently defragmenting that mountpoint, ensuring that
> everrything is compressed with zlib.

Defragmenting the used space might help find more contiguous
allocations.

> p.s. any other suggestion that might help with the fragmentation
> and data allocation. Should I try and rebalance the data on the
> drive?

Yes, regularly, as that defragments the unused space.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Btrfs + compression = slow performance and high cpu usage

2017-08-01 Thread Paul Jones
> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
> ow...@vger.kernel.org] On Behalf Of Konstantin V. Gavrilenko
> Sent: Tuesday, 1 August 2017 7:58 PM
> To: Peter Grandi <p...@btrfs.list.sabi.co.uk>
> Cc: Linux fs Btrfs <linux-btrfs@vger.kernel.org>
> Subject: Re: Btrfs + compression = slow performance and high cpu usage
> 
> Peter, I don't think the filefrag is showing the correct fragmentation status 
> of
> the file when the compression is used.
> At least the one that is installed by default in Ubuntu 16.04 -  e2fsprogs |
> 1.42.13-1ubuntu1
> 
> So for example, fragmentation of compressed file is 320 times more then
> uncompressed one.
> 
> root@homenas:/mnt/storage/NEW# filefrag test5g-zeroes
> test5g-zeroes: 40903 extents found
> 
> root@homenas:/mnt/storage/NEW# filefrag test5g-data
> test5g-data: 129 extents found

Compressed extents are about 128kb, uncompressed extents are about 128Mb. 
(can't remember the exact numbers.) 
I've had trouble with slow filesystems when using compression. The problem 
seems to go away when removing compression.

Paul.








Re: Btrfs + compression = slow performance and high cpu usage

2017-08-01 Thread Konstantin V. Gavrilenko
Peter, I don't think the filefrag is showing the correct fragmentation status 
of the file when the compression is used.
At least the one that is installed by default in Ubuntu 16.04 -  e2fsprogs | 
1.42.13-1ubuntu1

So for example, fragmentation of compressed file is 320 times more then 
uncompressed one.

root@homenas:/mnt/storage/NEW# filefrag test5g-zeroes
test5g-zeroes: 40903 extents found

root@homenas:/mnt/storage/NEW# filefrag test5g-data 
test5g-data: 129 extents found


I am currently defragmenting that mountpoint, ensuring that everrything is 
compressed with zlib. 
# btrfs fi defragment -rv -czlib /mnt/arh-backup 

my guess is that it will take another 24-36 hours to complete and then I will 
redo the test to see if that has helped.
will keep the list posted.

p.s. any other suggestion that might help with the fragmentation and data 
allocation. Should I try and rebalance the data on the drive?

kos



- Original Message -
From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk>
To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org>
Sent: Monday, 31 July, 2017 1:41:07 PM
Subject: Re: Btrfs + compression = slow performance and high cpu usage

[ ... ]

> grep 'model name' /proc/cpuinfo | sort -u 
> model name  : Intel(R) Xeon(R) CPU   E5645  @ 2.40GHz

Good, contemporary CPU with all accelerations.

> The sda device is a hardware RAID5 consisting of 4x8TB drives.
[ ... ]
> Strip Size  : 256 KB

So the full RMW data stripe length is 768KiB.

> [ ... ] don't see the previously reported behaviour of one of
> the kworker consuming 100% of the cputime, but the write speed
> difference between the compression ON vs OFF is pretty large.

That's weird; of course 'lzo' is a lot cheaper than 'zlib', but
in my test the much higher CPU time of the latter was spread
across many CPUs, while in your case it wasn't, even if the
E5645 has 6 CPUs and can do 12 threads. That seemed to point to
some high cost of finding free blocks, that is a very fragmented
free list, or something else.

> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress oflag=direct
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 26.0685 s, 206 MB/s

The results with 'oflag=direct' are not relevant, because Btrfs
behaves "differently" with that.

> mountflags: 
> (rw,relatime,compress-force=zlib,space_cache=v2,subvolid=5,subvol=/)
[ ... ]
> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress conv=fsync
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 77.4845 s, 69.3 MB/s
> mountflags: 
> (rw,relatime,compress-force=lzo,space_cache=v2,subvolid=5,subvol=/)
[ ... ]
> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress conv=fsync
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 122.321 s, 43.9 MB/s

That's pretty good for a RAID5 with 128KiB writes and a 768KiB
stripe size, on a 3ware, and looks like that the hw host adapter
does not have a persistent cache (battery backed usually). My
guess that watching transfer rates and latencies with 'iostat
-dk -zyx 1' did not happen.

> mountflags: (rw,relatime,space_cache=v2,subvolid=5,subvol=/)
[ ... ]
> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress conv=fsync
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 10.1033 s, 531 MB/s

I had mentioned in my previous reply the output of 'filefrag'.
That to me seems relevant here, because of RAID5 RMW and maximum
extent size with Brfs compression and strip/stripe size.

Perhaps redoing the tests with a 128KiB 'bs' *without*
compression would be interesting, perhaps even with 'oflag=sync'
instead of 'conv=fsync'.

It is hard for me to see a speed issue here with Btrfs: for
comparison I have done a simple test with a both a 3+1 MD RAID5
set with a 256KiB chunk size and a single block device on
"contemporary" 1T/2TB drives, capable of sequential transfer
rates of 150-190MB/s:

  soft#  grep -A2 sdb3 /proc/mdstat 
  md127 : active raid5 sde3[4] sdd3[2] sdc3[1] sdb3[0]
729808128 blocks super 1.0 level 5, 256k chunk, algorithm 2 [4/4] []

with compression:

  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 
/mnt/test5   
  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3
  soft#  rm -f /mnt/test5/testfile /mnt/sdg3/testfile

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile 
bs=1M count=1 conv=fsync
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 94.3605 s, 111 MB/s
  0.01user 12.59system 1:34.36elapsed 13%CPU (0avgtext+0avgdata 
2932maxresident)k
  13042144inputs+20482144outputs (3major+345minor)pagefaults 0swaps

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile 
bs=1M count=1 conv=fsync
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 93.5885 s, 112 MB/s
  0.03user 12.35syst

Re: Btrfs + compression = slow performance and high cpu usage

2017-07-31 Thread Peter Grandi
> [ ... ] It is hard for me to see a speed issue here with
> Btrfs: for comparison I have done a simple test with a both a
> 3+1 MD RAID5 set with a 256KiB chunk size and a single block
> device on "contemporary" 1T/2TB drives, capable of sequential
> transfer rates of 150-190MB/s: [ ... ]

The figures after this are a bit on the low side because I
realized looking at 'vmstat' that the source block device 'sda6'
was being a bottleneck, as the host has only 8GiB instead of the
16GiB I misremembered, and also 'sda' is a relatively slow flash
SSD that reads are most at around 220MB/s. So I have redone the
simple tests with a transfer size of 3GB, which ensures that
all reads are from memory cache:

with compression:

  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 
/mnt/test5
  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3
  soft#  rm -f /mnt/test5/testfile /mnt/sdg3/testfile   


  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile 
bs=1M count=3000 conv=fsync
  3000+0 records in
  3000+0 records out
  3145728000 bytes (3.1 GB) copied, 15.8869 s, 198 MB/s
  0.00user 2.80system 0:15.88elapsed 17%CPU (0avgtext+0avgdata 3056maxresident)k
  0inputs+6148256outputs (0major+346minor)pagefaults 0swaps

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile 
bs=1M count=3000 conv=fsync
  3000+0 records in
  3000+0 records out
  3145728000 bytes (3.1 GB) copied, 16.9663 s, 185 MB/s
  0.00user 2.61system 0:16.96elapsed 15%CPU (0avgtext+0avgdata 3056maxresident)k
  0inputs+6144672outputs (0major+346minor)pagefaults 0swaps

  soft#  btrfs fi df /mnt/test5/ | grep Data

  Data, single: total=3.00GiB, used=2.28GiB
  soft#  btrfs fi df /mnt/sdg3 | grep Data
  Data, single: total=3.00GiB, used=2.28GiB

  soft#  filefrag /mnt/test5/testfile /mnt/sdg3/testfile
  /mnt/test5/testfile: 8811 extents found
  /mnt/sdg3/testfile: 8759 extents found

Slightly weird that with a 3GB size the number of extents is
almost double that for the 10GB, but I guess that depends on
speed.

Then without compression:

  soft#  mount -t btrfs -o commit=10 /dev/md/test5 /mnt/test5
  soft#  mount -t btrfs -o commit=10 /dev/sdg3 /mnt/sdg3
  soft#  rm -f /mnt/test5/testfile /mnt/sdg3/testfile

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile 
bs=1M count=3000 conv=fsync
  3000+0 records in
  3000+0 records out
  3145728000 bytes (3.1 GB) copied, 8.06841 s, 390 MB/s
  0.00user 3.90system 0:08.80elapsed 44%CPU (0avgtext+0avgdata 2880maxresident)k
  0inputs+6153856outputs (0major+345minor)pagefaults 0swaps

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile 
bs=1M count=3000 conv=fsync
  3000+0 records in
  3000+0 records out
  3145728000 bytes (3.1 GB) copied, 30.215 s, 104 MB/s
  0.00user 4.82system 0:30.93elapsed 15%CPU (0avgtext+0avgdata 2888maxresident)k
  0inputs+6152128outputs (0major+347minor)pagefaults 0swaps

  soft#  filefrag /mnt/test5/testfile /mnt/sdg3/testfile

  /mnt/test5/testfile: 5 extents found
  /mnt/sdg3/testfile: 3 extents found

Also added:

  soft#  rm -f /mnt/test5/testfile /mnt/sdg3/testfile   
   

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=128k count=3000 | dd 
iflag=fullblock of=/mnt/test5/testfile bs=128k oflag=sync
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 160.315 s, 2.5 MB/s
  0.02user 0.46system 2:40.31elapsed 0%CPU (0avgtext+0avgdata 1992maxresident)k
  0inputs+0outputs (0major+124minor)pagefaults 0swaps
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 160.365 s, 2.5 MB/s

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=128k count=3000 | dd 
iflag=fullblock of=/mnt/sdg3/testfile bs=128k oflag=sync
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 113.51 s, 3.5 MB/s
  0.02user 0.56system 1:53.51elapsed 0%CPU (0avgtext+0avgdata 2156maxresident)k
  0inputs+0outputs (0major+120minor)pagefaults 0swaps
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 113.544 s, 3.5 MB/s

  soft#  filefrag /mnt/test5/testfile /mnt/sdg3/testfile
   
  /mnt/test5/testfile: 1 extent found
  /mnt/sdg3/testfile: 22 extents found

  soft#  rm -f /mnt/test5/testfile /mnt/sdg3/testfile   
   

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=1M count=1000 | dd 
iflag=fullblock of=/mnt/test5/testfile bs=1M oflag=sync

Re: Btrfs + compression = slow performance and high cpu usage

2017-07-31 Thread Peter Grandi
[ ... ]

> Also added:

Feeling very generous :-) today, adding these too:

  soft#  mkfs.btrfs -mraid10 -draid10 -L test5 /dev/sd{b,c,d,e}3
  [ ... ]
  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/sdb3 /mnt/test5

  soft#  rm -f /mnt/test5/testfile
  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile 
bs=1M count=3000 conv=fsync
  3000+0 records in
  3000+0 records out
  3145728000 bytes (3.1 GB) copied, 14.2166 s, 221 MB/s
  0.00user 2.54system 0:14.21elapsed 17%CPU (0avgtext+0avgdata 3056maxresident)k
  0inputs+6144768outputs (0major+346minor)pagefaults 0swaps

  soft#  rm -f /mnt/test5/testfile
  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile 
bs=128k count=3000 conv=fsync   
 
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 2.05933 s, 191 MB/s
  0.00user 0.32system 0:02.06elapsed 15%CPU (0avgtext+0avgdata 1996maxresident)k
  0inputs+772512outputs (0major+124minor)pagefaults 0swaps

  soft#  rm -f /mnt/test5/testfile
  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=1M count=1000 | dd 
iflag=fullblock of=/mnt/test5/testfile bs=1M oflag=sync 
  
  1000+0 records in
  1000+0 records out
  1048576000 bytes (1.0 GB) copied, 60.6019 s, 17.3 MB/s
  0.01user 1.04system 1:00.60elapsed 1%CPU (0avgtext+0avgdata 2888maxresident)k
  0inputs+0outputs (0major+348minor)pagefaults 0swaps
  1000+0 records in
  1000+0 records out
  1048576000 bytes (1.0 GB) copied, 60.4116 s, 17.4 MB/s

  soft#  rm -f /mnt/test5/testfile
  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=128k count=3000 | dd 
iflag=fullblock of=/mnt/test5/testfile bs=128k oflag=sync
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 148.04 s, 2.7 MB/s
  0.00user 0.62system 2:28.04elapsed 0%CPU (0avgtext+0avgdata 1996maxresident)k
  0inputs+0outputs (0major+125minor)pagefaults 0swaps
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 148.083 s, 2.7 MB/s

  soft#  sysctl vm/drop_caches=3
  vm.drop_caches = 3
  soft#  /usr/bin/time dd iflag=fullblock if=/mnt/test5/testfile bs=128k 
count=3000 of=/dev/zero 
  
  3000+0 records in
  3000+0 records out
  393216000 bytes (393 MB) copied, 1.09729 s, 358 MB/s
  0.00user 0.24system 0:01.10elapsed 23%CPU (0avgtext+0avgdata 2164maxresident)k
  459768inputs+0outputs (3major+121minor)pagefaults 0swaps
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs + compression = slow performance and high cpu usage

2017-07-31 Thread Peter Grandi
[ ... ]

> grep 'model name' /proc/cpuinfo | sort -u 
> model name  : Intel(R) Xeon(R) CPU   E5645  @ 2.40GHz

Good, contemporary CPU with all accelerations.

> The sda device is a hardware RAID5 consisting of 4x8TB drives.
[ ... ]
> Strip Size  : 256 KB

So the full RMW data stripe length is 768KiB.

> [ ... ] don't see the previously reported behaviour of one of
> the kworker consuming 100% of the cputime, but the write speed
> difference between the compression ON vs OFF is pretty large.

That's weird; of course 'lzo' is a lot cheaper than 'zlib', but
in my test the much higher CPU time of the latter was spread
across many CPUs, while in your case it wasn't, even if the
E5645 has 6 CPUs and can do 12 threads. That seemed to point to
some high cost of finding free blocks, that is a very fragmented
free list, or something else.

> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress oflag=direct
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 26.0685 s, 206 MB/s

The results with 'oflag=direct' are not relevant, because Btrfs
behaves "differently" with that.

> mountflags: 
> (rw,relatime,compress-force=zlib,space_cache=v2,subvolid=5,subvol=/)
[ ... ]
> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress conv=fsync
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 77.4845 s, 69.3 MB/s
> mountflags: 
> (rw,relatime,compress-force=lzo,space_cache=v2,subvolid=5,subvol=/)
[ ... ]
> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress conv=fsync
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 122.321 s, 43.9 MB/s

That's pretty good for a RAID5 with 128KiB writes and a 768KiB
stripe size, on a 3ware, and looks like that the hw host adapter
does not have a persistent cache (battery backed usually). My
guess that watching transfer rates and latencies with 'iostat
-dk -zyx 1' did not happen.

> mountflags: (rw,relatime,space_cache=v2,subvolid=5,subvol=/)
[ ... ]
> dd if=/dev/sdb  of=./testing count=5120 bs=1M status=progress conv=fsync
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 10.1033 s, 531 MB/s

I had mentioned in my previous reply the output of 'filefrag'.
That to me seems relevant here, because of RAID5 RMW and maximum
extent size with Brfs compression and strip/stripe size.

Perhaps redoing the tests with a 128KiB 'bs' *without*
compression would be interesting, perhaps even with 'oflag=sync'
instead of 'conv=fsync'.

It is hard for me to see a speed issue here with Btrfs: for
comparison I have done a simple test with a both a 3+1 MD RAID5
set with a 256KiB chunk size and a single block device on
"contemporary" 1T/2TB drives, capable of sequential transfer
rates of 150-190MB/s:

  soft#  grep -A2 sdb3 /proc/mdstat 
  md127 : active raid5 sde3[4] sdd3[2] sdc3[1] sdb3[0]
729808128 blocks super 1.0 level 5, 256k chunk, algorithm 2 [4/4] []

with compression:

  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 
/mnt/test5   
  soft#  mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3
  soft#  rm -f /mnt/test5/testfile /mnt/sdg3/testfile

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile 
bs=1M count=1 conv=fsync
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 94.3605 s, 111 MB/s
  0.01user 12.59system 1:34.36elapsed 13%CPU (0avgtext+0avgdata 
2932maxresident)k
  13042144inputs+20482144outputs (3major+345minor)pagefaults 0swaps

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile 
bs=1M count=1 conv=fsync
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 93.5885 s, 112 MB/s
  0.03user 12.35system 1:33.59elapsed 13%CPU (0avgtext+0avgdata 
2940maxresident)k
  13042144inputs+20482400outputs (3major+346minor)pagefaults 0swaps

  soft#  filefrag /mnt/test5/testfile /mnt/sdg3/testfile
  /mnt/test5/testfile: 48945 extents found
  /mnt/sdg3/testfile: 49029 extents found

  soft#  btrfs fi df /mnt/test5/ | grep Data
  Data, single: total=7.00GiB, used=6.55GiB

  soft#  btrfs fi df /mnt/sdg3 | grep Data
  Data, single: total=7.00GiB, used=6.55GiB

  soft#  sysctl vm/drop_caches=3
  vm.drop_caches = 3
  soft#  /usr/bin/time dd iflag=fullblock if=/mnt/test5/testfile bs=1M 
count=1 of=/dev/zero
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 23.2975 s, 450 MB/s
  0.01user 7.59system 0:23.32elapsed 32%CPU (0avgtext+0avgdata 2932maxresident)k
  13759624inputs+0outputs (3major+344minor)pagefaults 0swaps

  soft#  sysctl vm/drop_caches=3
  vm.drop_caches = 3
  soft#  /usr/bin/time dd iflag=fullblock if=/mnt/sdg3/testfile bs=1M 
count=1 of=/dev/zero  
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 35.0032 s, 300 MB/s
  0.01user 8.46system 0:35.03elapsed 24%CPU (0avgtext+0avgdata 2924maxresident)k
  13750568inputs+0outputs (3major+345minor)pagefaults 0swaps

and 

Re: Btrfs + compression = slow performance and high cpu usage

2017-07-30 Thread Konstantin V. Gavrilenko
all  0.00  0.00  4.84  5.09  0.00 90.08
14:31:45all  0.17  0.00  4.67  4.75  0.00 90.42
14:31:46all  0.00  0.00  4.60  3.76  0.00 91.64
14:31:47all  0.08  0.00  5.07  3.66  0.00 91.18
14:31:48all  0.00  0.00  5.01  3.68  0.00 91.31
14:31:49all  0.00  0.00  4.76  3.68  0.00 91.56
14:31:50all  0.08  0.00  4.59  3.59  0.00 91.73
14:31:51all  0.00  0.00  2.67  1.92  0.00 95.41






- Original Message -
From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk>
To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org>
Sent: Friday, 28 July, 2017 8:08:47 PM
Subject: Re: Btrfs + compression = slow performance and high cpu usage

> I am stuck with a problem of btrfs slow performance when using
> compression. [ ... ]

That to me looks like an issue with speed, not performance, and
in particular with PEBCAK issues.

As to high CPU usage, when you find a way to do both compression
and checksumming without using much CPU time, please send patches
urgently :-).

In your case the increase in CPU time is bizarre. I have the
Ubuntu 4.4 "lts-xenial" kernel and what you report does not
happen here (with a few little changes):

  soft#  grep 'model name' /proc/cpuinfo | sort -u
  model name  : AMD FX(tm)-6100 Six-Core Processor
  soft#  cpufreq-info | grep 'current CPU frequency'
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).

  soft#  lsscsi | grep 'sd[ae]'
  [0:0:0:0]diskATA  HFS256G32MNB-220 3L00  /dev/sda
  [5:0:0:0]diskATA  ST2000DM001-1CH1 CC44  /dev/sde

  soft#  mkfs.btrfs -f /dev/sde3
  [ ... ]
  soft#  mount -t btrfs -o 
discard,autodefrag,compress=lzo,compress-force,commit=10 /dev/sde3 /mnt/sde3

  soft#  df /dev/sda6 /mnt/sde3
  Filesystem 1M-blocks  Used Available Use% Mounted on
  /dev/sda6  90048 76046 14003  85% /
  /dev/sde3 23756819235501   1% /mnt/sde3

The above is useful context information that was "amazingly"
omitted from your reported.

In dmesg I see (not the "force zlib compression"):

  [327730.917285] BTRFS info (device sde3): turning on discard
  [327730.917294] BTRFS info (device sde3): enabling auto defrag
  [327730.917300] BTRFS info (device sde3): setting 8 feature flag
  [327730.917304] BTRFS info (device sde3): force zlib compression
  [327730.917313] BTRFS info (device sde3): disk space caching is enabled
  [327730.917315] BTRFS: has skinny extents
  [327730.917317] BTRFS: flagging fs with big metadata feature
  [327730.920740] BTRFS: creating UUID tree

and the result is:

  soft#  pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile 
bs=1M count=1 oflag=direct
  1+0 records in17MB/s] [==>] 11% ETA 
0:15:06
  1+0 records out
  1048576 bytes (10 GB) copied, 112.845 s, 92.9 MB/s
  0.05user 9.93system 1:53.20elapsed 8%CPU (0avgtext+0avgdata 3016maxresident)k
  120inputs+20496000outputs (1major+346minor)pagefaults 0swaps
  9.77GB 0:01:53 [88.3MB/s] [==>]
  11%

  soft#  btrfs fi df /mnt/sde3/
  Data, single: total=10.01GiB, used=9.77GiB
  System, DUP: total=8.00MiB, used=16.00KiB
  Metadata, DUP: total=1.00GiB, used=11.66MiB
  GlobalReserve, single: total=16.00MiB, used=0.00B

As it was running system CPU time was under 20% of one CPU:

  top - 18:57:29 up 3 days, 19:27,  4 users,  load average: 5.44, 2.82, 1.45
  Tasks: 325 total,   1 running, 324 sleeping,   0 stopped,   0 zombie
  %Cpu0  :  0.0 us,  2.3 sy,  0.0 ni, 91.3 id,  6.3 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu1  :  0.0 us,  1.3 sy,  0.0 ni, 78.5 id, 20.2 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu2  :  0.3 us,  5.8 sy,  0.0 ni, 81.0 id, 12.5 wa,  0.0 hi,  0.3 si,  0.0 
st
  %Cpu3  :  0.3 us,  3.4 sy,  0.0 ni, 91.9 id,  4.4 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu4  :  0.3 us, 10.6 sy,  0.0 ni, 55.4 id, 33.7 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu5  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 
st
  KiB Mem:   8120660 total,  5162236 used,  2958424 free,  4440100 buffers
  KiB Swap:0 total,0 used,0 free.   351848 cached Mem

PID  PPID USER  PR  NIVIRTRESDATA  %CPU %MEM TIME+ TTY  
COMMAND
  21047 21046 root  20   08872   26161364  12.9  0.0   0:02.31 
pts/3dd iflag=fullblo+
  21045  3535 root  20   07928   1948  

Re: Btrfs + compression = slow performance and high cpu usage

2017-07-28 Thread Peter Grandi
In addition to my previous "it does not happen here" comment, if
someone is reading this thread, there are some other interesting
details:

> When the compression is turned off, I am able to get the
> maximum 500-600 mb/s write speed on this disk (raid array)
> with minimal cpu usage.

No details on whether it is a parity RAID or not.

> btrfs device usage /mnt/arh-backup1/
> /dev/sda, ID: 2
>Device size:21.83TiB
>Device slack:  0.00B
>Data,single: 9.29TiB
>Metadata,single:46.00GiB
>System,single:  32.00MiB
>Unallocated:12.49TiB

That's exactly 24TB of "Device size", of which around 45% are
used, and the string "backup" may suggest that the content is
backups, which may indicate a very fragmented freespace.
Of course compression does not help with that, in my freshly
created Btrfs volume I get as expected:

  soft#  umount /mnt/sde3
  soft#  mount -t btrfs -o commit=10 /dev/sde3 /mnt/sde3
 

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sde3/testfile 
bs=1M count=1 conv=fsync
  1+0 records in
  1+0 records out
  1048576 bytes (10 GB) copied, 103.747 s, 101 MB/s
  0.00user 11.56system 1:44.86elapsed 11%CPU (0avgtext+0avgdata 
3072maxresident)k
  20480672inputs+20498272outputs (1major+349minor)pagefaults 0swaps

  soft#  filefrag /mnt/sde3/testfile 
  /mnt/sde3/testfile: 11 extents found

versus:

  soft#  umount /mnt/sde3   
 
  soft#  mount -t btrfs -o commit=10,compress=lzo,compress-force /dev/sde3 
/mnt/sde3

  soft#  /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sde3/testfile 
bs=1M count=1 conv=fsync
  1+0 records in  
  1+0 records out
  1048576 bytes (10 GB) copied, 109.051 s, 96.2 MB/s
  0.02user 13.03system 1:49.49elapsed 11%CPU (0avgtext+0avgdata 
3068maxresident)k
  20494784inputs+20492320outputs (1major+347minor)pagefaults 0swaps

  soft#  filefrag /mnt/sde3/testfile 
  /mnt/sde3/testfile: 49287 extents found

Most the latter extents are mercifully rather contiguous, their
size is just limited by the compression code, here is an extract
from 'filefrag -v' from around the middle:

  24757:  1321888.. 1321919:   11339579..  11339610: 32:   11339594:
  24758:  1321920.. 1321951:   11339597..  11339628: 32:   11339611:
  24759:  1321952.. 1321983:   11339615..  11339646: 32:   11339629:
  24760:  1321984.. 1322015:   11339632..  11339663: 32:   11339647:
  24761:  1322016.. 1322047:   11339649..  11339680: 32:   11339664:
  24762:  1322048.. 1322079:   11339667..  11339698: 32:   11339681:
  24763:  1322080.. 1322111:   11339686..  11339717: 32:   11339699:
  24764:  1322112.. 1322143:   11339703..  11339734: 32:   11339718:
  24765:  1322144.. 1322175:   11339720..  11339751: 32:   11339735:
  24766:  1322176.. 1322207:   11339737..  11339768: 32:   11339752:
  24767:  1322208.. 1322239:   11339754..  11339785: 32:   11339769:
  24768:  1322240.. 1322271:   11339771..  11339802: 32:   11339786:
  24769:  1322272.. 1322303:   11339789..  11339820: 32:   11339803:

But again this is on a fresh empty Btrfs volume.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs + compression = slow performance and high cpu usage

2017-07-28 Thread Hugo Mills
On Fri, Jul 28, 2017 at 06:20:14PM +, William Muriithi wrote:
> Hi Roman,
> 
> > autodefrag
> 
> This sure sounded like a good thing to enable? on paper? right?...
> 
> The moment you see anything remotely weird about btrfs, this is the first 
> thing you have to disable and retest without. Oh wait, the first would be 
> qgroups, this one is second.
> 
> What's the problem with autodefrag?  I am also using it, so you caught my 
> attention when you implied that it shouldn't be used.  According to docs, it 
> seem like one of the very mature feature of the filesystem.  See below for 
> the doc I am referring to 
> 
> https://btrfs.wiki.kernel.org/index.php/Status
> 
> I am using it as I assumed it could prevent the filesystem being too 
> fragmented long term, but never thought there was price to pay for using it

   It introduces additional I/O on writes, as it modifies a small area
surrounding any write or cluster of writes.

   I'm not aware of it causing massive slowdowns, in the way the
qgroups does in some situations.

   If your system is already marginal in terms of being able to
support the I/O required, then turning on autodefrag will make things
worse (but you may be heading for _much_ worse performance in the
future as the FS becomes more fragmented -- depending on your write
patterns and use case).

   Hugo.

-- 
Hugo Mills | Great oxymorons of the world, no. 6:
hugo@... carfax.org.uk | Mature Student
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


RE: Btrfs + compression = slow performance and high cpu usage

2017-07-28 Thread William Muriithi
Hi Roman,

> autodefrag

This sure sounded like a good thing to enable? on paper? right?...

The moment you see anything remotely weird about btrfs, this is the first thing 
you have to disable and retest without. Oh wait, the first would be qgroups, 
this one is second.

What's the problem with autodefrag?  I am also using it, so you caught my 
attention when you implied that it shouldn't be used.  According to docs, it 
seem like one of the very mature feature of the filesystem.  See below for the 
doc I am referring to 

https://btrfs.wiki.kernel.org/index.php/Status

I am using it as I assumed it could prevent the filesystem being too fragmented 
long term, but never thought there was price to pay for using it

Regards,
William

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs + compression = slow performance and high cpu usage

2017-07-28 Thread Peter Grandi
> I am stuck with a problem of btrfs slow performance when using
> compression. [ ... ]

That to me looks like an issue with speed, not performance, and
in particular with PEBCAK issues.

As to high CPU usage, when you find a way to do both compression
and checksumming without using much CPU time, please send patches
urgently :-).

In your case the increase in CPU time is bizarre. I have the
Ubuntu 4.4 "lts-xenial" kernel and what you report does not
happen here (with a few little changes):

  soft#  grep 'model name' /proc/cpuinfo | sort -u
  model name  : AMD FX(tm)-6100 Six-Core Processor
  soft#  cpufreq-info | grep 'current CPU frequency'
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).
current CPU frequency is 3.30 GHz (asserted by call to hardware).

  soft#  lsscsi | grep 'sd[ae]'
  [0:0:0:0]diskATA  HFS256G32MNB-220 3L00  /dev/sda
  [5:0:0:0]diskATA  ST2000DM001-1CH1 CC44  /dev/sde

  soft#  mkfs.btrfs -f /dev/sde3
  [ ... ]
  soft#  mount -t btrfs -o 
discard,autodefrag,compress=lzo,compress-force,commit=10 /dev/sde3 /mnt/sde3

  soft#  df /dev/sda6 /mnt/sde3
  Filesystem 1M-blocks  Used Available Use% Mounted on
  /dev/sda6  90048 76046 14003  85% /
  /dev/sde3 23756819235501   1% /mnt/sde3

The above is useful context information that was "amazingly"
omitted from your reported.

In dmesg I see (not the "force zlib compression"):

  [327730.917285] BTRFS info (device sde3): turning on discard
  [327730.917294] BTRFS info (device sde3): enabling auto defrag
  [327730.917300] BTRFS info (device sde3): setting 8 feature flag
  [327730.917304] BTRFS info (device sde3): force zlib compression
  [327730.917313] BTRFS info (device sde3): disk space caching is enabled
  [327730.917315] BTRFS: has skinny extents
  [327730.917317] BTRFS: flagging fs with big metadata feature
  [327730.920740] BTRFS: creating UUID tree

and the result is:

  soft#  pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile 
bs=1M count=1 oflag=direct
  1+0 records in17MB/s] [==>] 11% ETA 
0:15:06
  1+0 records out
  1048576 bytes (10 GB) copied, 112.845 s, 92.9 MB/s
  0.05user 9.93system 1:53.20elapsed 8%CPU (0avgtext+0avgdata 3016maxresident)k
  120inputs+20496000outputs (1major+346minor)pagefaults 0swaps
  9.77GB 0:01:53 [88.3MB/s] [==>]
  11%

  soft#  btrfs fi df /mnt/sde3/
  Data, single: total=10.01GiB, used=9.77GiB
  System, DUP: total=8.00MiB, used=16.00KiB
  Metadata, DUP: total=1.00GiB, used=11.66MiB
  GlobalReserve, single: total=16.00MiB, used=0.00B

As it was running system CPU time was under 20% of one CPU:

  top - 18:57:29 up 3 days, 19:27,  4 users,  load average: 5.44, 2.82, 1.45
  Tasks: 325 total,   1 running, 324 sleeping,   0 stopped,   0 zombie
  %Cpu0  :  0.0 us,  2.3 sy,  0.0 ni, 91.3 id,  6.3 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu1  :  0.0 us,  1.3 sy,  0.0 ni, 78.5 id, 20.2 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu2  :  0.3 us,  5.8 sy,  0.0 ni, 81.0 id, 12.5 wa,  0.0 hi,  0.3 si,  0.0 
st
  %Cpu3  :  0.3 us,  3.4 sy,  0.0 ni, 91.9 id,  4.4 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu4  :  0.3 us, 10.6 sy,  0.0 ni, 55.4 id, 33.7 wa,  0.0 hi,  0.0 si,  0.0 
st
  %Cpu5  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 
st
  KiB Mem:   8120660 total,  5162236 used,  2958424 free,  4440100 buffers
  KiB Swap:0 total,0 used,0 free.   351848 cached Mem

PID  PPID USER  PR  NIVIRTRESDATA  %CPU %MEM TIME+ TTY  
COMMAND
  21047 21046 root  20   08872   26161364  12.9  0.0   0:02.31 
pts/3dd iflag=fullblo+
  21045  3535 root  20   07928   1948 460  12.3  0.0   0:00.72 
pts/3pv -tpreb /dev/s+
  21019 2 root  20   0   0  0   0   1.3  0.0   0:42.88 ?
[kworker/u16:1]

Of course "oflag=direct" is a rather "optimistic" option in this
context, so I tried again with something more sensible:

  soft#  pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile 
bs=1M count=1 conv=fsync
  1+0 records in.4MB/s] [==>] 11% ETA 
0:14:41
  1+0 records out
  1048576 bytes (10 GB) copied, 110.523 s, 94.9 MB/s
  0.03user 8.94system 1:50.71elapsed 8%CPU (0avgtext+0avgdata 3024maxresident)k
  136inputs+20499648outputs (1major+348minor)pagefaults 0swaps
  9.77GB 0:01:50 [90.3MB/s] [==>] 11%

  soft#  btrfs fi df /mnt/sde3/
  Data, single: total=7.01GiB, used=6.35GiB
  System, DUP: total=8.00MiB, used=16.00KiB
  Metadata, DUP: total=1.00GiB, used=15.81MiB
  GlobalReserve, 

Re: Btrfs + compression = slow performance and high cpu usage

2017-07-28 Thread Roman Mamedov
On Fri, 28 Jul 2017 17:40:50 +0100 (BST)
"Konstantin V. Gavrilenko"  wrote:

> Hello list, 
> 
> I am stuck with a problem of btrfs slow performance when using compression.
> 
> when the compress-force=lzo mount flag is enabled, the performance drops to 
> 30-40 mb/s and one of the btrfs processes utilises 100% cpu time.
> mount options: btrfs 
> relatime,discard,autodefrag,compress=lzo,compress-force,space_cache=v2,commit=10

It does not work like that, you need to set compress-force=lzo (and remove
compress=).

With your setup I believe you currently use compress-force[=zlib](default),
overriding compress=lzo, since it's later in the options order.

Secondly,

> autodefrag

This sure sounded like a good thing to enable? on paper? right?...

The moment you see anything remotely weird about btrfs, this is the first
thing you have to disable and retest without. Oh wait, the first would be
qgroups, this one is second.

Finally, what is the reasoning behind "commit=10", and did you check with the
default value of 30?

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html