Re: btrfs issue with mariadb incremental backup

2017-08-10 Thread siranee . ja
Hi Chris,
The kernel version that I test is "4.4.0-89-generic" as I tested on ubuntu lxd 
If I
want to change the kernel version I have to upgrade the host box.

As you suggest the rsync to  compare the subvolumes. I found the point.
the subvolumes are different only after I start to del old subvolumes on 
machine A 
the steps are

30 08 * * * root /root/script/backup/backupsnap.sh root password
/var/lib/mariadb/mysql >> /var/log/btrfs_snap.log
05 09 * * * root /root/script/backupbtrfs_inc.sh /var/lib/mariadb 192.168.45.166
/var/lib/mariadb >> /var/log/btrfs_send.log
30 19 * * * root /root/script/delete_btrfs_sub_snap_volume.sh /var/lib/mariadb 
7 >>
/var/log/btrfs_del.log


The following script maintain snapshot to currently only 7 snapshots.
[root@backuplogC7 ~]# cat /root/script/delete_btrfs_sub_snap_volume.sh
basepath=$1
keepcount=$2
havecount=`btrfs sub list -s ${basepath}|cut -d' ' -f14|wc -l`
delcount=$[$keepcount-$havecount];
datet=$(date +%Y%m%d%H%M)
echo "Start Delete ${datet}"
if [ $delcount -lt 0 ]; then
# list only snapshot subvolume
for i in `btrfs sub list -s ${basepath}|cut -d' ' -f14|head ${delcount}`
do
echo "btrfs sub delete ${basepath}/$i"
btrfs sub delete ${basepath}/$i
btrfs sub sync ${basepath}
done
else
echo "$delcount -gt 0 nothing to delete"
fi
echo "Stop Delete ${datet}"

Does it mean my delete script is not the properly way of the btrfs purge old
snapshot on source?

Best Regards,

Siranee Jarwachirakul.

> On Wed, Aug 9, 2017 at 12:36 AM,   wrote:
>
>>   488  btrfs sub snap mysql_201707230830 mysql
>>   489  systemctl start mariadb
>>   490  btrfs sub list .
>>   491  cat /var/log/mariadb/mariadb.log
>
> OK so mysql_201707230830 once on machine B is inconsistent somehow. So
> the questions I have are:
>
> Is mysql_201707230830 on machine A really identical to
> mysql_201707230830 on machine B? You can do an rsync -anc (double
> check those options) which should independently check whether those
> two subvolumes are in fact identical. The -n is a no op, which doesn't
> really matter much because as read only subvolumes any attempt to sync
> will just result in noisy messages. The -c causes rsync to do its own
> checksum verification on both sides.
>
> If the subvolumes are different, we need to find out why.
>
> If the subvolumes are the same, then I wonder if you can reproduce the
> mariadb complaint on machine A merely by making a rw snapshot of
> mysql_201707230830 and trying to start it. If so, then it's not a send
> receive problem, it sounds like the snapshot itself is inconsistent,
> maybe mariadb hasn't actually completely closed out the database at
> the time the read only snapshot was taken? I'm not sure.
>
> If the subvolumes are different, I'm going to recommend updating at
> least the btrfs-progs because 4.4 is kinda old at this point. The
> kernel code is what's mainly responsible for the send stream, and the
> user space code is mainly responsible for receiving. And I don't off
> hand know or want to look up all the send receive changes between 4.4
> and 4.12 to speculate on whether this is has already been fixed.
>
> What's the kernel version?
>
> --
> Chris Murphy
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/5] btrfs: Add zstd support

2017-08-10 Thread Nick Terrell
On 8/10/17, 7:13 PM, "Adam Borowski"  wrote:
> On Wed, Aug 09, 2017 at 07:39:02PM -0700, Nick Terrell wrote:
>> Add zstd compression and decompression support to BtrFS.
> 
> Re-tested on arm64, amd64 and i386, this time everything seems fine so far.
> 
> As I'm too lazy to have a separate test setup for the zlib level patch,
> I'm using a dummy implementation:
> 
> --- a/fs/btrfs/zstd.c
> +++ b/fs/btrfs/zstd.c
> @@ -423,10 +423,16 @@ static int zstd_decompress(struct list_head *ws, 
> unsigned char *data_in,
> return ret;
>  }
>  
> +static void zstd_set_level(struct list_head *ws, unsigned int type)
> +{
> +   // TODO
> +}
> +
>  const struct btrfs_compress_op btrfs_zstd_compress = {
> .alloc_workspace = zstd_alloc_workspace,
> .free_workspace = zstd_free_workspace,
> .compress_pages = zstd_compress_pages,
> .decompress_bio = zstd_decompress_bio,
> .decompress = zstd_decompress,
> +   .set_level = zstd_set_level,
>  };
> 
> 
> It might be worthwhile to do some early testing using different levels,
> though.

I'll run some preliminary tests and make sure nothing blows up and the
compression ratio looks good.

> 
> 
> 喵!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
> ⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
> ⠈⠳⣄ A master species delegates.




Re: [PATCH v5 3/5] btrfs: Add zstd support

2017-08-10 Thread Adam Borowski
On Wed, Aug 09, 2017 at 07:39:02PM -0700, Nick Terrell wrote:
> Add zstd compression and decompression support to BtrFS.

Re-tested on arm64, amd64 and i386, this time everything seems fine so far.

As I'm too lazy to have a separate test setup for the zlib level patch,
I'm using a dummy implementation:

--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -423,10 +423,16 @@ static int zstd_decompress(struct list_head *ws, unsigned 
char *data_in,
return ret;
 }
 
+static void zstd_set_level(struct list_head *ws, unsigned int type)
+{
+   // TODO
+}
+
 const struct btrfs_compress_op btrfs_zstd_compress = {
.alloc_workspace = zstd_alloc_workspace,
.free_workspace = zstd_free_workspace,
.compress_pages = zstd_compress_pages,
.decompress_bio = zstd_decompress_bio,
.decompress = zstd_decompress,
+   .set_level = zstd_set_level,
 };


It might be worthwhile to do some early testing using different levels,
though.


喵!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄ A master species delegates.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix out of bounds array access while reading extent buffer

2017-08-10 Thread Duncan
Hugo Mills posted on Wed, 09 Aug 2017 18:15:29 + as excerpted:

> On Wed, Aug 09, 2017 at 11:10:16AM -0600, Liu Bo wrote:
>> There is a cornel case that slip through the checkers in functions
>  ^^ corner
> 
>Sorry, that's been bugging me every time it goes past. A cornel is
> a kind of tree, apparently.

LOL!  Me too but likely more so because I actually live in Cornell 
Condominiums so seeing "cornel" in this context is a rather confusing 
context switch, to say the least!  I let it slip the first time as I 
never expected it to make it to a v3, but...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replace missing disc => strange result!?

2017-08-10 Thread Duncan
Cloud Admin posted on Thu, 10 Aug 2017 22:00:08 +0200 as excerpted:

> Hi,
> I had a disc failure and must replace it. I followed the description on
>  https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devi
> ces and started the replacement.
> Setup is a two disc RAID1!
> After it was done, I called 'btrfs fi us /mn/btrfsroot' and I got the
> output below. What is wrong?
> Is it a rebalancing issue? I thought the replace command started it
> automatically...

...
 
> Data,single: Size:1.00GiB, Used:0.00B
>    /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8     1.00
> GiB

...

It's not entirely clear what you're referring to with the "what's wrong" 
question, but I'll assume it's all those single and dup chunks that 
aren't raid1.

Unlike device delete, which does an implicit rebalance, replace simply 
replaces one device with another in terms of content, and doesn't 
rebalance anything on remaining devices.  This tends to make it much 
faster, with less danger of something else bad (like another device going 
bad) happening in the process, but it pretty much copies verbatim as much 
as possible so unlike device delete with its implicit balance or an 
explicit balance, existing chunks remain as they were.

Which means any existing single and dup chunks didn't get removed, as I'm 
guessing you expected.

But most or all of them are 0 used, so an explicit rebalance to eliminate 
them should be quite short as it'll just delete the references to them.  
Try this (path from your post above):

btrfs balance start -dusage=0 -musage=0 /mn/btrfsroot

That should eliminate the 0-usage chunks, making the usage output easier 
to follow even if you do need to post updates because my guess as to what 
you were referring to with the "what's wrong" was incorrect.  And as I 
said it should be much faster (almost instantaneous on ssd, probably not 
/quite/ that on spinning rust) than rebalancing chunks that weren't 
empty, too.  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Replace missing disc => strange result!?

2017-08-10 Thread Cloud Admin
Hi,
I had a disc failure and must replace it. I followed the description on
 https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devi
ces and started the replacement.
Setup is a two disc RAID1!
After it was done, I called 'btrfs fi us /mn/btrfsroot' and I got the
output below. What is wrong?
Is it a rebalancing issue? I thought the replace command started it
automatically...

Overall:
Device size:   3.64TiB
Device allocated:      1.04TiB
Device unallocated:    2.60TiB
Device missing:  0.00B
Used:    519.76GiB
Free (estimated):      1.56TiB  (min: 1.56TiB)
Data ratio:   2.00
Metadata ratio:   1.60
Global reserve:  279.11MiB  (used: 0.00B)

Data,single: Size:1.00GiB, Used:0.00B
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8   1.00
GiB

Data,RAID1: Size:265.00GiB, Used:259.60GiB
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8 265.00
GiB
   /dev/mapper/luks-ff4bf5da-48af-4563-abb2-db083bd01512 265.00
GiB

Data,DUP: Size:264.00GiB, Used:0.00B
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8 528.00
GiB

Metadata,single: Size:1.00GiB, Used:0.00B
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8   1.00
GiB

Metadata,RAID1: Size:1.00GiB, Used:286.03MiB
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8   1.00
GiB
   /dev/mapper/luks-ff4bf5da-48af-4563-abb2-db083bd01512   1.00
GiB

Metadata,DUP: Size:512.00MiB, Used:112.00KiB
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8   1.00
GiB

System,single: Size:32.00MiB, Used:0.00B
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8  32.00
MiB

System,RAID1: Size:8.00MiB, Used:48.00KiB
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8   8.00
MiB
   /dev/mapper/luks-ff4bf5da-48af-4563-abb2-db083bd01512   8.00
MiB

System,DUP: Size:32.00MiB, Used:48.00KiB
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8  64.00
MiB

Unallocated:
   /dev/mapper/luks-3ff6c412-4d3a-4d33-85a3-cc70e95c26f8   1.04
TiB
   /dev/mapper/luks-ff4bf5da-48af-4563-abb2-db083bd01512   1.56
TiB

Bye
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 15:25, Hugo Mills wrote:

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:

On 08/10/2017 04:30 AM, Eric Biggers wrote:


Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


I did btrfs benchmarks of kernel trees and other normal data sets as
well.  The numbers were in line with what Nick is posting here.
zstd is a big win over both lzo and zlib from a btrfs point of view.

It's true Nick's patches only support a single compression level in
btrfs, but that's because btrfs doesn't have a way to pass in the
compression ratio.  It could easily be a mount option, it was just
outside the scope of Nick's initial work.


Could we please not add more mount options? I get that they're easy
to implement, but it's a very blunt instrument. What we tend to see
(with both nodatacow and compress) is people using the mount options,
then asking for exceptions, discovering that they can't do that, and
then falling back to doing it with attributes or btrfs properties.
Could we just start with btrfs properties this time round, and cut out
the mount option part of this cycle.
AFAIUI, the intent is to extend the compression type specification for 
both the mount options and the property, not to add a new mount option. 
I think we all agree that `mount -o compress=zstd3` is a lot better than 
`mount -o compress=zstd,compresslevel=3`.


In the long run, it'd be great to see most of the btrfs-specific
mount options get deprecated and ultimately removed entirely, in
favour of attributes/properties, where feasible.
Are properties set on the root subvolume inherited properly?  Because 
unless they are, we can't get the same semantics.


Two other counter arguments on completely removing BTRFS-specific mount 
options:
1. It's a lot easier and a lot more clearly defined to change things 
that affect global behavior of the FS by a remount than having to 
iterate everything in the FS to update properties.  If I'm disabling 
autodefrag, I'd much rather just `mount -o remount,noautodefrag` than 
`find / -xdev -exec btrfs property set \{\} autodefrag false`, as the 
first will take effect for everything simultaneously and run 
exponentially quicker.
2. There are some things that don't make sense as per-object settings or 
are otherwise nonsensical on objects.  Many, but not all, of the BTRFS 
specific mount options fall into this category IMO, with the notable 
exception of compress[-force], [no]autodefrag, [no]datacow, and 
[no]datasum.  Some other options do make sense as properties of the 
filesystem (commit, flushoncommit, {inode,space}_cache, max_inline, 
metadata_ratio, [no]ssd, and [no]treelog are such options), but many are 
one-off options that affect behavior on mount (like skip_balance, 
clear_cache, nologreplay, norecovery, usebbackuproot, and subvol).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Hugo Mills
On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed 
> >independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.

   Could we please not add more mount options? I get that they're easy
to implement, but it's a very blunt instrument. What we tend to see
(with both nodatacow and compress) is people using the mount options,
then asking for exceptions, discovering that they can't do that, and
then falling back to doing it with attributes or btrfs properties.
Could we just start with btrfs properties this time round, and cut out
the mount option part of this cycle.

   In the long run, it'd be great to see most of the btrfs-specific
mount options get deprecated and ultimately removed entirely, in
favour of attributes/properties, where feasible.

   Hugo.

-- 
Hugo Mills | Klytus! Are your men on the right pills? Maybe you
hugo@... carfax.org.uk | should execute their trainer!
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  Ming the Merciless, Flash Gordon


signature.asc
Description: Digital signature


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Nick Terrell
On 8/10/17, 10:48 AM, "Austin S. Hemmelgarn"  wrote:
>On 2017-08-10 13:24, Eric Biggers wrote:
>>On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
>>>On 2017-08-10 04:30, Eric Biggers wrote:
On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>
> It can compress at speeds approaching lz4, and quality approaching lzma.

 Well, for a very loose definition of "approaching", and certainly not at 
 the
 same time.  I doubt there's a use case for using the highest compression 
 levels
 in kernel mode --- especially the ones using zstd_opt.h.
>>> Large data-sets with WORM access patterns and infrequent writes
>>> immediately come to mind as a use case for the highest compression
>>> level.
>>>
>>> As a more specific example, the company I work for has a very large
>>> amount of documentation, and we keep all old versions.  This is all
>>> stored on a file server which is currently using BTRFS.  Once a
>>> document is written, it's almost never rewritten, so write
>>> performance only matters for the first write.  However, they're read
>>> back pretty frequently, so we need good read performance.  As of
>>> right now, the system is set to use LZO compression by default, and
>>> then when a new document is added, the previous version of that
>>> document gets re-compressed using zlib compression, which actually
>>> results in pretty significant space savings most of the time.  I
>>> would absolutely love to use zstd compression with this system with
>>> the highest compression level, because most people don't care how
>>> long it takes to write the file out, but they do care how long it
>>> takes to read a file (even if it's an older version).
>> 
>> This may be a reasonable use case, but note this cannot just be the regular
>> "zstd" compression setting, since filesystem compression by default must 
>> provide
>> reasonable performance for many different access patterns.  See the patch in
>> this series which actually adds zstd compression to btrfs; it only uses 
>> level 1.
>> I do not see a patch which adds a higher compression mode.  It would need to 
>> be
>> a special setting like "zstdhc" that users could opt-in to on specific
>> directories.  It also would need to be compared to simply compressing in
>> userspace.  In many cases compressing in userspace is probably the better
>> solution for the use case in question because it works on any filesystem, 
>> allows
>> using any compression algorithm, and if random access is not needed it is
>> possible to compress each file as a single stream (like a .xz file), which
>> produces a much better compression ratio than the block-by-block compression
>> that filesystems have to use.
> There has been discussion as well as (I think) initial patches merged 
> for support of specifying the compression level for algorithms which 
> support multiple compression levels in BTRFS.  I was actually under the 
> impression that we had decided to use level 3 as the default for zstd, 
> but that apparently isn't the case, and with the benchmark issues, it 
> may not be once proper benchmarks are run.

There are some initial patches to add compression levels to BtrFS [1]. Once
it's ready, we can add compression levels to zstd. The default compression
level in the current patch is 3.

[1] https://lkml.kernel.org/r/20170724172939.24527-1-dste...@suse.com


N�r��yb�X��ǧv�^�)޺{.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Nick Terrell
On 8/10/17, 1:30 AM, "Eric Biggers"  wrote:
> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>
>> It can compress at speeds approaching lz4, and quality approaching lzma.
> 
> Well, for a very loose definition of "approaching", and certainly not at the
> same time.  I doubt there's a use case for using the highest compression 
> levels
> in kernel mode --- especially the ones using zstd_opt.h.
> 
>> 
>> The code was ported from the upstream zstd source repository.
> 
> What version?

zstd-1.1.4 with patches applied from upstream. I'll include it in the
next patch version.

>> `linux/zstd.h` header was modified to match linux kernel style.
>> The cross-platform and allocation code was stripped out. Instead zstd
>> requires the caller to pass a preallocated workspace. The source files
>> were clang-formatted [1] to match the Linux Kernel style as much as
>> possible. 
> 
> It would be easier to compare to the upstream version if it was not all
> reformatted.  There is a chance that bugs were introduced by Linux-specific
> changes, and it would be nice if they could be easily reviewed.  (Also I don't
> know what clang-format settings you used, but there are still a lot of
> differences from the Linux coding style.)

The clang-format settings I used are available in the zstd repo [1]. I left
the line length long, since it looked terrible otherwise.I set up a branch
in my zstd GitHub fork called "original-formatted" [2]. I've taken the source
I based the kernel patches off of [3] and ran clang-format without any other
changes. If you have any suggestions to improve the clang-formatting
please let me know.

>> 
>> I benchmarked zstd compression as a special character device. I ran zstd
>> and zlib compression at several levels, as well as performing no
>> compression, which measure the time spent copying the data to kernel space.
>> Data is passed to the compresser 4096 B at a time. The benchmark file is
>> located in the upstream zstd source repository under
>> `contrib/linux-kernel/zstd_compress_test.c` [2].
>> 
>> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
>> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
>> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
>> 211,988,480 B large. Run the following commands for the benchmark:
>> 
>> sudo modprobe zstd_compress_test
>> sudo mknod zstd_compress_test c 245 0
>> sudo cp silesia.tar zstd_compress_test
>> 
>> The time is reported by the time of the userland `cp`.
>> The MB/s is computed with
>> 
>> 1,536,217,008 B / time(buffer size, hash)
>> 
>> which includes the time to copy from userland.
>> The Adjusted MB/s is computed with
>> 
>> 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
>> 
>> The memory reported is the amount of memory the compressor requests.
>> 
>> | Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
>> |--|--|--|---|-|--|--|
>> | none | 11988480 |0.100 | 1 | 2119.88 |- |- |
>> | zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
>> | zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
>> | zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
>> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
>> | zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
>> | zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
>> | zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
>> | zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
>> | zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
>> | zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |
>> 
> 
> Theses benchmarks are misleading because they compress the whole file as a
> single stream without resetting the dictionary, which isn't how data will
> typically be compressed in kernel mode.  With filesystem compression the data
> has to be divided into small chunks that can each be decompressed 
> independently.
> That eliminates one of the primary advantages of Zstandard (support for large
> dictionary sizes).

This benchmark isn't meant to be representative of a filesystem scenario. I
wanted to show off zstd without anything else going on. Even in filesystems
where the data is chunked, zstd uses the whole chunk as the window (128 KB
in BtrFS and SquashFS by default), where zlib uses 32 KB. I have benchmarks
for BtrFS and SquashFS in their respective patches [4][5], and I've copied
the BtrFS table below (which was run with 2 threads).

| Method  | Ratio | Compression MB/s | Decompression speed |
|-|---|--|-|
| None|  0.99 |  504 | 686 |
| lzo |  1.66 |

Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Chris Mason

On 08/10/2017 03:00 PM, Eric Biggers wrote:

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:

On 08/10/2017 04:30 AM, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:



The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
|--|--|--|---|-|--|--|
| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |



Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


I did btrfs benchmarks of kernel trees and other normal data sets as
well.  The numbers were in line with what Nick is posting here.
zstd is a big win over both lzo and zlib from a btrfs point of view.

It's true Nick's patches only support a single compression level in
btrfs, but that's because btrfs doesn't have a way to pass in the
compression ratio.  It could easily be a mount option, it was just
outside the scope of Nick's initial work.



I am not surprised --- Zstandard is closer to the state of the art, both
format-wise and implementation-wise, than the other choices in BTRFS.  My point
is that benchmarks need to account for how much data is compressed at a time.
This is a common mistake when comparing different compression algorithms; the
algorithm name and compression level do not tell the whole story.  The
dictionary size is extremely significant.  No one is going to compress or
decompress a 200 MB file as a single stream in kernel mode, so it does not make
sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
that it compressed only one page (4 KiB) at a time, but I hope that has been, or
is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
amount of data compressed at a time happens to be different between the
different algorithms, note that BTRFS benchmarks are likely to be measuring that
as much as the algorithms themselves.)


Btrfs hooks the compression code into the delayed allocation mechanism 
we use to gather large extents for COW.  So if you write 100MB to a 
file, we'll have 100MB to compress at a time (within the limits of the 
amount of pages we allow to collect before forcing it down).


But we want to balance how much memory you might need to uncompress 
during random reads.  So we have an artificial limit of 128KB that we 
send at a time to the compression code.  It's easy to change this, it's 
just a tradeoff made to limit the cost of reading small bits.


It's the same for zlib,lzo and the new zstd patch.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> 
> >>The memory reported is the amount of memory the compressor requests.
> >>
> >>| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
> >>|--|--|--|---|-|--|--|
> >>| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
> >>| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
> >>| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
> >>| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
> >>| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
> >>| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
> >>| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
> >>| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
> >>| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
> >>| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
> >>| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |
> >>
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed 
> >independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.
> 

I am not surprised --- Zstandard is closer to the state of the art, both
format-wise and implementation-wise, than the other choices in BTRFS.  My point
is that benchmarks need to account for how much data is compressed at a time.
This is a common mistake when comparing different compression algorithms; the
algorithm name and compression level do not tell the whole story.  The
dictionary size is extremely significant.  No one is going to compress or
decompress a 200 MB file as a single stream in kernel mode, so it does not make
sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
that it compressed only one page (4 KiB) at a time, but I hope that has been, or
is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
amount of data compressed at a time happens to be different between the
different algorithms, note that BTRFS benchmarks are likely to be measuring that
as much as the algorithms themselves.)

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 13:24, Eric Biggers wrote:

On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:

On 2017-08-10 04:30, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:


It can compress at speeds approaching lz4, and quality approaching lzma.


Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.

Large data-sets with WORM access patterns and infrequent writes
immediately come to mind as a use case for the highest compression
level.

As a more specific example, the company I work for has a very large
amount of documentation, and we keep all old versions.  This is all
stored on a file server which is currently using BTRFS.  Once a
document is written, it's almost never rewritten, so write
performance only matters for the first write.  However, they're read
back pretty frequently, so we need good read performance.  As of
right now, the system is set to use LZO compression by default, and
then when a new document is added, the previous version of that
document gets re-compressed using zlib compression, which actually
results in pretty significant space savings most of the time.  I
would absolutely love to use zstd compression with this system with
the highest compression level, because most people don't care how
long it takes to write the file out, but they do care how long it
takes to read a file (even if it's an older version).


This may be a reasonable use case, but note this cannot just be the regular
"zstd" compression setting, since filesystem compression by default must provide
reasonable performance for many different access patterns.  See the patch in
this series which actually adds zstd compression to btrfs; it only uses level 1.
I do not see a patch which adds a higher compression mode.  It would need to be
a special setting like "zstdhc" that users could opt-in to on specific
directories.  It also would need to be compared to simply compressing in
userspace.  In many cases compressing in userspace is probably the better
solution for the use case in question because it works on any filesystem, allows
using any compression algorithm, and if random access is not needed it is
possible to compress each file as a single stream (like a .xz file), which
produces a much better compression ratio than the block-by-block compression
that filesystems have to use.
There has been discussion as well as (I think) initial patches merged 
for support of specifying the compression level for algorithms which 
support multiple compression levels in BTRFS.  I was actually under the 
impression that we had decided to use level 3 as the default for zstd, 
but that apparently isn't the case, and with the benchmark issues, it 
may not be once proper benchmarks are run.


Also, on the note of compressing in userspace, the use case I quoted at 
least can't do that because we have to deal with Windows clients and 
users have to be able to open files directly on said Windows clients.  I 
entirely agree that real archival storage is better off using userspace 
compression, but sometimes real archival storage isn't an option.


Note also that LZ4HC is in the kernel source tree currently but no one is using
it vs. the regular LZ4.  I think it is the kind of thing that sounded useful
originally, but at the end of the day no one really wants to use it in kernel
mode.  I'd certainly be interested in actual patches, though.
Part of that is the fact that BTRFS is one of the only consumers (AFAIK) 
of this API that can freely choose all aspects of their usage, and the 
consensus here (which I don't agree with I might add) amounts to the 
argument that 'we already have  compression with a  compression 
ratio, we don't need more things like that'.  I would personally love to 
see LZ4HC support in BTRFS (based on testing my own use cases, LZ4 is 
more deterministic than LZO for both compression and decompression, and 
most of the non archival usage I have of BTRFS benefits from 
determinism), but there's not any point in me writing up such a patch 
because it's almost certain to get rejected because BTRFS already has 
LZO.  The main reason that zstd is getting considered at all is that the 
quoted benchmarks show clear benefits in decompression speed relative to 
zlib and far better compression ratios than LZO.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Chris Mason

On 08/10/2017 04:30 AM, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:



The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
|--|--|--|---|-|--|--|
| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |



Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


I did btrfs benchmarks of kernel trees and other normal data sets as 
well.  The numbers were in line with what Nick is posting here.  zstd is 
a big win over both lzo and zlib from a btrfs point of view.


It's true Nick's patches only support a single compression level in 
btrfs, but that's because btrfs doesn't have a way to pass in the 
compression ratio.  It could easily be a mount option, it was just 
outside the scope of Nick's initial work.


-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Thu, Aug 10, 2017 at 10:57:01AM -0400, Austin S. Hemmelgarn wrote:
> Also didn't think to mention this, but I could see the max level
> being very popular for use with SquashFS root filesystems used in
> LiveCD's. Currently, they have to decide between read performance
> and image size, while zstd would provide both.

The high compression levels of Zstandard are indeed a great fit for SquashFS,
but SquashFS images are created in userspace by squashfs-tools.  The kernel only
needs to be able to decompress them.

(Also, while Zstandard provides very good tradeoffs and will probably become the
preferred algorithm for SquashFS, it's misleading to imply that users won't have
to make decisions anymore.  It does not compress as well as XZ or decompress as
fast as LZ4, except maybe in very carefully crafted benchmarks.)

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-08-10 04:30, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> >>
> >>It can compress at speeds approaching lz4, and quality approaching lzma.
> >
> >Well, for a very loose definition of "approaching", and certainly not at the
> >same time.  I doubt there's a use case for using the highest compression 
> >levels
> >in kernel mode --- especially the ones using zstd_opt.h.
> Large data-sets with WORM access patterns and infrequent writes
> immediately come to mind as a use case for the highest compression
> level.
> 
> As a more specific example, the company I work for has a very large
> amount of documentation, and we keep all old versions.  This is all
> stored on a file server which is currently using BTRFS.  Once a
> document is written, it's almost never rewritten, so write
> performance only matters for the first write.  However, they're read
> back pretty frequently, so we need good read performance.  As of
> right now, the system is set to use LZO compression by default, and
> then when a new document is added, the previous version of that
> document gets re-compressed using zlib compression, which actually
> results in pretty significant space savings most of the time.  I
> would absolutely love to use zstd compression with this system with
> the highest compression level, because most people don't care how
> long it takes to write the file out, but they do care how long it
> takes to read a file (even if it's an older version).

This may be a reasonable use case, but note this cannot just be the regular
"zstd" compression setting, since filesystem compression by default must provide
reasonable performance for many different access patterns.  See the patch in
this series which actually adds zstd compression to btrfs; it only uses level 1.
I do not see a patch which adds a higher compression mode.  It would need to be
a special setting like "zstdhc" that users could opt-in to on specific
directories.  It also would need to be compared to simply compressing in
userspace.  In many cases compressing in userspace is probably the better
solution for the use case in question because it works on any filesystem, allows
using any compression algorithm, and if random access is not needed it is
possible to compress each file as a single stream (like a .xz file), which
produces a much better compression ratio than the block-by-block compression
that filesystems have to use.

Note also that LZ4HC is in the kernel source tree currently but no one is using
it vs. the regular LZ4.  I think it is the kind of thing that sounded useful
originally, but at the end of the day no one really wants to use it in kernel
mode.  I'd certainly be interested in actual patches, though.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 07:32, Austin S. Hemmelgarn wrote:

On 2017-08-10 04:30, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:


It can compress at speeds approaching lz4, and quality approaching lzma.


Well, for a very loose definition of "approaching", and certainly not 
at the
same time.  I doubt there's a use case for using the highest 
compression levels

in kernel mode --- especially the ones using zstd_opt.h.
Large data-sets with WORM access patterns and infrequent writes 
immediately come to mind as a use case for the highest compression level.


As a more specific example, the company I work for has a very large 
amount of documentation, and we keep all old versions.  This is all 
stored on a file server which is currently using BTRFS.  Once a document 
is written, it's almost never rewritten, so write performance only 
matters for the first write.  However, they're read back pretty 
frequently, so we need good read performance.  As of right now, the 
system is set to use LZO compression by default, and then when a new 
document is added, the previous version of that document gets 
re-compressed using zlib compression, which actually results in pretty 
significant space savings most of the time.  I would absolutely love to 
use zstd compression with this system with the highest compression 
level, because most people don't care how long it takes to write the 
file out, but they do care how long it takes to read a file (even if 
it's an older version).
Also didn't think to mention this, but I could see the max level being 
very popular for use with SquashFS root filesystems used in LiveCD's. 
Currently, they have to decide between read performance and image size, 
while zstd would provide both.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs umount hang

2017-08-10 Thread Angel Shtilianov
Here is a fresh snapshot hang example:

Here kworker/u24:0 is blocked waiting on btrfs_tree_read_lock.
And again underlaying is btrfs_search_slot after it has gone through
cache_block_group.
The other two has blocked on the same place in cache_block_group. So I
can confirm it is hanging the same way as with umount.
Here is the struct extent_buffer passed to btrfs_tree_read_lock for
kworker/u24:0:
struct extent_buffer {
...
  lock_owner = 23286,
  write_locks = {
counter = 1
  },
  read_locks = {
counter = 0
  },
  blocking_writers = {
counter = 1
  },
..
}
So again the ioctl has taken the lock needed for its underlying operation.
Btrfs pid 23286 is going to perform a write, so it can't skip locking,
and not just read locking, but write locking.
The fact that it owns the wr lock is good. skip_locking in its
btrfs_path struct is zero, so till here it is fine and locked. The
problem rises later, if cache isnt available. Then we are about to try
to do cache_block_group. Even if it is not going to hang on the
wait in the beginning in the cache_block_group function (as in current case):
while (cache->cached == BTRFS_CACHE_FAST) {
struct btrfs_caching_control *ctl;

ctl = cache->caching_ctl;
atomic_inc(>count);
prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
spin_unlock(>lock);

>schedule();

finish_wait(>wait, );
put_caching_control(ctl);
spin_lock(>lock);
}
it seems that there is high chance to hang later on
btrfs_tree_read_lock even if there is absolutely no concurrent work
going on by calling:
cache_block_group
 ->load_free_space_cache
   ->lookup_free_space_inode
 ->__lookup_free_space_inode
   ->btrfs_search_slot
 ->btrfs_read_lock_root_node
   ->btrfs_tree_read_lock
since we are already holding the write lock.
It looks like if we call cache_block_group at find_free_extent under
call via btrfs_search_slot we are about to get deadlocked too in case
the data isn't cached.

It looks scary. I hope that I've missed something.

Best regards,
Angel

On Thu, Aug 10, 2017 at 1:30 AM, Angel Shtilianov  wrote:
> Hi Jeff,
> That's exactly what I am observing in the backtraces.
> umount reaches cache_block_group passing through btrfs_search_slot,
> which looks like got the wr_lock (to be confirmed tomorrow when I'm in
> the office) and all other btrfs worker threads are fighting for the
> btrfs_tree_read_lock.
> Umount is hanging on the cache_block_group wait, all others are
> hanging on the read_lock. So caching seems to be the bottleneck,
> blocking all others. Good catch! I'll take a look at the commit
> tomorrow morning (with large cup of coffee of course).
>
> By the way, looking through the logs I can see one more pattern -
> umount after drop snapshot reaches similar state. The common is that
> drop snapshot will also call search slot, get the readlock, drop it
> and get the write lock. In the same time there is umount causing
> fs_sync going concurently. I've seen at least three such hangs today
> and I've started the corresponding crashdumps to get copied to my
> machine. Tomorrow I'll take a look at them. It seems the common is
> that umount again reaches to the same place - cache_block_group and
> hangs. Tomorrow I'll take a look at the crash dump for the workers
> state.
>
> Thank you,
> Angel
>
> On Wed, Aug 9, 2017 at 6:42 PM, Jeff Mahoney  wrote:
>> On 8/8/17 7:30 AM, Angel Shtilianov wrote:
>>> crash> bt -f 31625
>>> PID: 31625  TASK: 88046a833400  CPU: 7   COMMAND: "btrfs-transacti"
>>> wants to acquire struct extent_buffer 88000460aca0 lock, whose
>>> lock_owner is 27574.
>>>
>>> here is pid 27574:
>>> PID: 27574  TASK: 88038b469a00  CPU: 4   COMMAND: "kworker/u32:9"
>>> which is also is trying to acquire eb lock 8802598b6200, and here
>>> the owner is 31696.
>>>
>>> 31696 is
>>> PID: 31696  TASK: 88044b59ce00  CPU: 5   COMMAND: "umount"
>>>
>>> So definitely here is a kind of deadlock.
>>> umount holds the lock needed by the workers to complete and waits them
>>> to complete.
>>> Lockdep wouldn't complain about that.
>>> I am still about to investigate what has previously triggered/disabled 
>>> lockdep.
>>> I have to obtain the log from the machine, but I need some time to get it.
>>>
>>> Jeff, you were right.
>>> Could you help demystifying how we ended up here?
>>
>> Hi Angel -
>>
>> It looks like a regression introduced by 291c7d2f5, but that's a very
>> old commit.  As that commit says, it's a rare occurence to hit that
>> wait, and that's probably why we haven't seen this issue sooner.
>>
>> There's potential for this to happen whenever two threads are modifying
>> the tree at once and one needs to find a free extent.  I'll need to
>> think a bit on how to fix it.
>>
>> -Jeff
>>
>>> Best regards,
>>> Angel
>>>
>>> On Mon, Aug 7, 2017 at 9:10 PM, Jeff Mahoney 

Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 04:30, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:


It can compress at speeds approaching lz4, and quality approaching lzma.


Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.
Large data-sets with WORM access patterns and infrequent writes 
immediately come to mind as a use case for the highest compression level.


As a more specific example, the company I work for has a very large 
amount of documentation, and we keep all old versions.  This is all 
stored on a file server which is currently using BTRFS.  Once a document 
is written, it's almost never rewritten, so write performance only 
matters for the first write.  However, they're read back pretty 
frequently, so we need good read performance.  As of right now, the 
system is set to use LZO compression by default, and then when a new 
document is added, the previous version of that document gets 
re-compressed using zlib compression, which actually results in pretty 
significant space savings most of the time.  I would absolutely love to 
use zstd compression with this system with the highest compression 
level, because most people don't care how long it takes to write the 
file out, but they do care how long it takes to read a file (even if 
it's an older version).




The code was ported from the upstream zstd source repository.


What version?


`linux/zstd.h` header was modified to match linux kernel style.
The cross-platform and allocation code was stripped out. Instead zstd
requires the caller to pass a preallocated workspace. The source files
were clang-formatted [1] to match the Linux Kernel style as much as
possible.


It would be easier to compare to the upstream version if it was not all
reformatted.  There is a chance that bugs were introduced by Linux-specific
changes, and it would be nice if they could be easily reviewed.  (Also I don't
know what clang-format settings you used, but there are still a lot of
differences from the Linux coding style.)



I benchmarked zstd compression as a special character device. I ran zstd
and zlib compression at several levels, as well as performing no
compression, which measure the time spent copying the data to kernel space.
Data is passed to the compresser 4096 B at a time. The benchmark file is
located in the upstream zstd source repository under
`contrib/linux-kernel/zstd_compress_test.c` [2].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
211,988,480 B large. Run the following commands for the benchmark:

 sudo modprobe zstd_compress_test
 sudo mknod zstd_compress_test c 245 0
 sudo cp silesia.tar zstd_compress_test

The time is reported by the time of the userland `cp`.
The MB/s is computed with

 1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Adjusted MB/s is computed with

 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
|--|--|--|---|-|--|--|
| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |



Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/49] btrfs: avoid access to .bi_vcnt directly

2017-08-10 Thread Christoph Hellwig
> +static unsigned int get_bio_pages(struct bio *bio)
> +{
> + unsigned i;
> + struct bio_vec *bv;
> +
> + bio_for_each_segment_all(bv, bio, i)
> + ;
> +
> + return i;
> +}

s/get_bio_pages/bio_nr_pages/ ?

Also this seems like a useful helper for bio.h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>
> It can compress at speeds approaching lz4, and quality approaching lzma.

Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.

> 
> The code was ported from the upstream zstd source repository.

What version?

> `linux/zstd.h` header was modified to match linux kernel style.
> The cross-platform and allocation code was stripped out. Instead zstd
> requires the caller to pass a preallocated workspace. The source files
> were clang-formatted [1] to match the Linux Kernel style as much as
> possible. 

It would be easier to compare to the upstream version if it was not all
reformatted.  There is a chance that bugs were introduced by Linux-specific
changes, and it would be nice if they could be easily reviewed.  (Also I don't
know what clang-format settings you used, but there are still a lot of
differences from the Linux coding style.)

> 
> I benchmarked zstd compression as a special character device. I ran zstd
> and zlib compression at several levels, as well as performing no
> compression, which measure the time spent copying the data to kernel space.
> Data is passed to the compresser 4096 B at a time. The benchmark file is
> located in the upstream zstd source repository under
> `contrib/linux-kernel/zstd_compress_test.c` [2].
> 
> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
> 211,988,480 B large. Run the following commands for the benchmark:
> 
> sudo modprobe zstd_compress_test
> sudo mknod zstd_compress_test c 245 0
> sudo cp silesia.tar zstd_compress_test
> 
> The time is reported by the time of the userland `cp`.
> The MB/s is computed with
> 
> 1,536,217,008 B / time(buffer size, hash)
> 
> which includes the time to copy from userland.
> The Adjusted MB/s is computed with
> 
> 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
> 
> The memory reported is the amount of memory the compressor requests.
> 
> | Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
> |--|--|--|---|-|--|--|
> | none | 11988480 |0.100 | 1 | 2119.88 |- |- |
> | zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
> | zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
> | zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
> | zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
> | zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
> | zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
> | zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
> | zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
> | zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |
> 

Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html