Re: Data Deduplication

Curtis Preston Mon, 27 Aug 2007 20:38:19 -0700

I use "subfile" to differentiate from file-level de-dupe, which is
really only CAS.  (A subfile de-dupe product will, of course, notice two
files that are exactly the same as well -- just like a file-level CAS
product will.)


Subfile to me means that it looks inside the file, and looks for
duplicated information inside that file.  Consider two versions of a
file stored inside TSM, for example.  A subfile de-dupe product would
notice that most of the information between those two files is the same
and store that info once.  Then it would also store any info that is
unique to each file.  

I stay away from terms like block, chunk, and fragment in this context
because the mean different things to different people, and mean other
things historically outside of de-dupe.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Paul Zarnowski
Sent: Monday, August 27, 2007 2:01 PM
To: [email protected]
Subject: Re: [ADSM-L] Data Deduplication

Curtis - I'm unclear on your terminology.  Are you equating "subfile"
to "block" level deduping?  To me, block level means block
boundaries, whereas subfile doesn't have the boundary
restriction.  Perhaps I interpret these words this way because of my
history.  To me, a block is a 4K chunk (or 1K or some fixed
amount).  But I am suspecting that this is not what you mean.

In fact, my impression was that some vendors deduped at a block level
(my defnition) and others at a subfile level, which to me is probably
more valuable but also probably more performance-costly to implement.

I've read lots of articles about this and talked with many
vendors.  I'll take a look at your article.  Thanks.



--
Paul Zarnowski                            Ph: 607-255-4757
Manager, Storage Services                 Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801    Em: [EMAIL PROTECTED]

Re: Data Deduplication

Reply via email to