On Wednesday 17 March 2010 09:48:18 Heinz-Josef Claes wrote: > Hi, > > just want to add one correction to your thoughts: > > Storage is not cheap if you think about enterprise storage on a SAN, > replicated to another data centre. Using dedup on the storage boxes leads > to performance issues and other problems - only NetApp is offering this at > the moment and it's not heavily used (because of the issues).
there are at least two other suppliers with inline dedup products and there is OSS solution: lessfs > So I think it would be a big advantage for professional use to have dedup > build into the filesystem - processors are faster and faster today and not > the cost drivers any more. I do not think it's a problem to "spend" on > core of a 2 socket box with 12 cores for this purpose. > Storage is cost intensive: > - SAN boxes are expensive > - RAID5 in two locations is expensive > - FC lines between locations is expensive (depeding very much on where you > are). In-line dedup is expensive in two ways: first you have to cache the data going to disk and generate checksum for it, then you have to look if such block is already stored -- if the database doesn't fit into RAM (for a VM host it's more than likely) it requires at least few disk seeks, if not a few dozen for really big databases. Then you should read the block/extent back and compare them bit for bit. And only then write the data to the disk. That reduces your IOPS by at least an order of maginitude, if not more. For post-process dedup you can go as fast as your HDDs will allow you. And then, when your machine is mostly idle you can go and churn through the data. IMHO in-line dedup is a good thing only as storage for backups -- when you have high probability that the stored data is duplicated (and with a 1:10 dedup ratio you have 90% probability, it is). So the CPU cost is only one factor. HDDs are a major bottleneck too. All things considered, it would be best to have both post-process and in-line data deduplication, but I think, that in-line dedup will see much less use. > > Naturally, you would not use this feature for all kind of use cases (eg. > heavily used database), but I think there is enough need. > > my 2 cents, > Heinz-Josef Claes -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl System Zarządzania Jakością zgodny z normą ISO 9001:2000 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html