I use "subfile" to differentiate from file-level de-dupe, which is really only CAS. (A subfile de-dupe product will, of course, notice two files that are exactly the same as well -- just like a file-level CAS product will.)
Subfile to me means that it looks inside the file, and looks for duplicated information inside that file. Consider two versions of a file stored inside TSM, for example. A subfile de-dupe product would notice that most of the information between those two files is the same and store that info once. Then it would also store any info that is unique to each file. I stay away from terms like block, chunk, and fragment in this context because the mean different things to different people, and mean other things historically outside of de-dupe. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Paul Zarnowski Sent: Monday, August 27, 2007 2:01 PM To: [email protected] Subject: Re: [ADSM-L] Data Deduplication Curtis - I'm unclear on your terminology. Are you equating "subfile" to "block" level deduping? To me, block level means block boundaries, whereas subfile doesn't have the boundary restriction. Perhaps I interpret these words this way because of my history. To me, a block is a 4K chunk (or 1K or some fixed amount). But I am suspecting that this is not what you mean. In fact, my impression was that some vendors deduped at a block level (my defnition) and others at a subfile level, which to me is probably more valuable but also probably more performance-costly to implement. I've read lots of articles about this and talked with many vendors. I'll take a look at your article. Thanks. -- Paul Zarnowski Ph: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801 Em: [EMAIL PROTECTED]
