Am Wed, 15 Nov 2017 08:11:04 +0100
schrieb waxhead <waxh...@dirtcellar.net>:

> As for dedupe there is (to my knowledge) nothing fully automatic yet. 
> You have to run a program to scan your filesystem but all the 
> deduplication is done in the kernel.
> duperemove works apparently quite well when I tested it, but there
> may be some performance implications.

There's bees as near-line deduplication tool, that is it watches for
generation changes in the filesystem and walks the inodes. It only
looks at extents, not at files. Deduplication itself is then delegated
to the kernel which ensures all changes are data-safe. The process is
running as a daemon and processes your changes in realtime (delayed by
a few seconds to minutes of course, due to transaction commit and
hashing phase).

You need to dedicate it part of your RAM to work, around 1 GB is
usually sufficient to work well enough. The RAM will be locked and
cannot be swapped out, so you should have a sufficiently equipped
system.

Works very well here (2TB of data, 1GB hash table, 16GB RAM).
New dDuplicated files are picked up within seconds, scanned (hitting
the cache most of the time thus not requiring physical IO), and then
submitted to the kernel for deduplication.

I'd call that fully automatic: Once set up, it just works, and works
well. Performance impact is very low once the initial scan is done.

https://github.com/Zygo/bees


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to