On Wednesday 17 March 2010 16:33:41 Leszek Ciesielski wrote:
> On Wed, Mar 17, 2010 at 4:25 PM, Hubert Kario <h...@qbs.com.pl> wrote:
> > On Wednesday 17 March 2010 09:48:18 Heinz-Josef Claes wrote:
> >> Hi,
> >>
> >> just want to add one correction to your thoughts:
> >>
> >> Storage is not cheap if you think about enterprise storage on a SAN,
> >> replicated to another data centre. Using dedup on the storage boxes
> >> leads to performance issues and other problems - only NetApp is offering
> >> this at the moment and it's not heavily used (because of the issues).
> >
> > there are at least two other suppliers with inline dedup products and
> > there is OSS solution: lessfs
> >
> >> So I think it would be a big advantage for professional use to have
> >> dedup build into the filesystem - processors are faster and faster today
> >> and not the cost drivers any more. I do not think it's a problem to
> >> "spend" on core of a 2 socket box with 12 cores for this purpose.
> >> Storage is cost intensive:
> >> - SAN boxes are expensive
> >> - RAID5 in two locations is expensive
> >> - FC lines between locations is expensive (depeding very much on where
> >> you are).
> >
> > In-line dedup is expensive in two ways: first you have to cache the data
> > going to disk and generate checksum for it, then you have to look if such
> > block is already stored -- if the database doesn't fit into RAM (for a VM
> > host it's more than likely) it requires at least few disk seeks, if not a
> > few dozen for really big databases. Then you should read the block/extent
> > back and compare them bit for bit. And only then write the data to the
> > disk. That reduces your IOPS by at least an order of maginitude, if not
> > more.
> 
> Sun decided that with SHA256 (which ZFS uses for normal checksumming)
> collisions are unlikely enough to skip the read/compare step:
> http://blogs.sun.com/bonwick/entry/zfs_dedup . That's not the case, of
> course, with btrfs-used CRC32, but a switch to a stronger hash would
> be recommended to reduce collisions anyway. And yes, for the truly
> paranoid, a forced verification (after the hashes match) is always an
> option.
> 

If the server contains financial data I'd prefer the "impossible" not 
"unlikely".

Read further, Sun did provide a way to enable the compare step by using 
"verify" instead of "on":
zfs set dedup=verify <pool>

And, yes, I know that the probability of hardware malfunction is vastly higher 
than the probability of collision (that's why I wrote "should", next time I'll 
write it as SHOULD as per RFC2119 ;), but, as the history showed, all hash 
algorithms are broken, the question is only when, if the FS does verify the 
data, then the attacker can't use the collisions to get data it souldn't have 
access to.
-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl

System Zarządzania Jakością
zgodny z normą ISO 9001:2000
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to