On Fri, Dec 9, 2016 at 11:16 AM, Darrick J. Wong
<darrick.w...@oracle.com> wrote:
> [adding mark fasheh (duperemove maintainer) to cc]
>
> On Fri, Dec 09, 2016 at 07:29:21AM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-12-08 21:54, Chris Murphy wrote:
>> >On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong <darrick.w...@oracle.com> 
>> >wrote:
>> >>On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
>> >>>OK something's wrong.
>> >>>
>> >>>Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
>> >>>(mkfs.btrfs -dsingle -msingle, default mount options) and two
>> >>>identical files separately copied.
>> >>>
>> >>>[chris@f25s]$ ls -li /mnt/test
>> >>>total 2811904
>> >>>260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> >>>259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> >>>
>> >>>[chris@f25s]$ filefrag /mnt/test/*
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
>> >>>
>> >>>
>> >>>[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
>> >>>Using 128K blocks
>> >>>Using hash: murmur3
>> >>>Gathering file list...
>> >>>Using 4 threads for file hashing phase
>> >>>[1/2] (50.00%) csum: 
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> >>>[2/2] (100.00%) csum: 
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> >>>Total files:  2
>> >>>Total hashes: 21968
>> >>>Loading only duplicated hashes from hashfile.
>> >>>Using 4 threads for dedupe phase
>> >>>[0xba8400] (00001/10947) Try to dedupe extents with id e47862ea
>> >>>[0xba84a0] (00003/10947) Try to dedupe extents with id ffed44f2
>> >>>[0xba84f0] (00002/10947) Try to dedupe extents with id ffeefcdd
>> >>>[0xba8540] (00004/10947) Try to dedupe extents with id ffe4cf64
>> >>>[0xba8540] Add extent for file
>> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
>> >>>1182924800 (4)
>> >>>[0xba8540] Add extent for file
>> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
>> >>>1182924800 (5)
>> >>>[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
>> >>>131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
>> >>
>> >>Ew, it's deduping these two 1.4GB files 128K at a time, which results in
>> >>12000 ioctl calls.  Each of those 12000 calls has to lock the two
>> >>inodes, read the file contents, remap the blocks, etc.  instead of
>> >>finding the maximal identical range and making a single call for the
>> >>whole range.
>> >>
>> >>That's probably why it's taking forever to dedupe.
>> >
>> >Yes but it looks like it's also heavily fragmenting the files as a
>> >result as well.
>
> I'm not sure why btrfs has that behavior... XFS doesn't do that, and
> evidently there's a bug in ocfs2 such that it sometimes merges records
> and sometimes does not.  Hmm, I'll have to take a second look at ocfs2.

I don't know if it's a kernel regression or a duperemove regression,
but I'm reasonably certain it's a regression because I used kernel
circa 4.6 and duperemove 0.10 in June and it did not do this; or at
the least it was not this verbose with thousands of entries per file
even with -v. I must've deduped 300GiB inside of 30 minutes. So for
two 1.4GiB ISOs to take more than 10 minutes to dedupe is not at all
what I'd expect.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to