Re: Content based storage

Hubert Kario Wed, 17 Mar 2010 08:26:50 -0700

On Wednesday 17 March 2010 09:48:18 Heinz-Josef Claes wrote:
> Hi,
> 
> just want to add one correction to your thoughts:
> 
> Storage is not cheap if you think about enterprise storage on a SAN,
> replicated to another data centre. Using dedup on the storage boxes leads
>  to performance issues and other problems - only NetApp is offering this at
>  the moment and it's not heavily used (because of the issues).


there are at least two other suppliers with inline dedup products and there is 
OSS solution: lessfs

> So I think it would be a big advantage for professional use to have dedup
> build into the filesystem - processors are faster and faster today and not
>  the cost drivers any more. I do not think it's a problem to "spend" on
>  core of a 2 socket box with 12 cores for this purpose.
> Storage is cost intensive:
> - SAN boxes are expensive
> - RAID5 in two locations is expensive
> - FC lines between locations is expensive (depeding very much on where you
> are).

In-line dedup is expensive in two ways: first you have to cache the data going 
to disk and generate checksum for it, then you have to look if such block is 
already stored -- if the database doesn't fit into RAM (for a VM host it's more 
than likely) it requires at least few disk seeks, if not a few dozen for 
really big databases. Then you should read the block/extent back and compare 
them bit for bit. And only then write the data to the disk. That reduces your 
IOPS by at least an order of maginitude, if not more. 

For post-process dedup you can go as fast as your HDDs will allow you. And 
then, when your machine is mostly idle you can go and churn through the data.

IMHO in-line dedup is a good thing only as storage for backups -- when you 
have high probability that the stored data is duplicated (and with a 1:10 
dedup ratio you have 90% probability, it is).

So the CPU cost is only one factor. HDDs are a major bottleneck too.

All things considered, it would be best to have both post-process and in-line 
data deduplication, but I think, that in-line dedup will see much less use.

> 
> Naturally, you would not use this feature for all kind of use cases (eg.
> heavily used database), but I think there is enough need.
> 
> my 2 cents,
> Heinz-Josef Claes
-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl

System Zarządzania Jakością
zgodny z normą ISO 9001:2000
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Content based storage

Reply via email to