On Wed, Jan 5, 2011 at 11:46 AM, Josef Bacik <jo...@redhat.com> wrote:
> Dedup is only usefull if you _know_ you are going to have duplicate 
> information,
> so the two major usecases that come to mind are
>
> 1) Mail server.  You have small files, probably less than 4k (blocksize) that
> you are storing hundreds to thousands of.  Using dedup would be good for this
> case, and you'd have to have a small dedup blocksize for it to be usefull.
>
> 2) Virtualized guests.  If you have 5 different RHEL5 virt guests, chances are
> you are going to share data between them, but unlike with the mail server
> example, you are likely to find much larger chunks that are the same, so you'd
> want a larger dedup blocksize, say 64k.  You want this because if you did just
> 4k you'd end up with a ridiculous amount of framentation and performance would
> go down the toilet, so you need a larger dedup blocksize to make for better
> performance.

You missed out on the most obvious, and useful, use case for dedupe:
  central backups server.

Our current backup server does an rsync backup of 127 servers every
night into a single ZFS pool.  90+ of those servers are identical
Debian installs (school servers), 20-odd of those are identical
FreeBSD installs (firewalls/routers), and the rest are mail/web/db
servers (Debian, Ubuntu, RedHat, Windows).

Just as a test, we copied a week of backups to a Linux box running
ZFS-fuse with dedupe enabled, and had a combined dedupe/compress
ration in the low double-digits (11 or 12x, something like that).  Now
we're just waiting for ZFSv22+ to hit FreeBSD to enable dedupe on the
backups server.

For backups, and central storage for VMs, online dedupe is a massive
win.  Offline, maybe.  Either way, dedupe is worthwhile.

-- 
Freddie Cash
fjwc...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to