Excerpts from Peter A's message of 2011-01-05 22:58:36 -0500: > On Wednesday, January 05, 2011 08:19:04 pm Spelic wrote: > > > I'd just make it always use the fs block size. No point in making it > > > variable. > > > > Agreed. What is the reason for variable block size? > > First post on this list - I mostly was just reading so far to learn more on > fs > design but this is one topic I (unfortunately) have experience with... > > You wouldn't believe the difference variable block size dedupe makes. For a > pure fileserver, its ok to dedupe on block level but for most other uses, > variable is king. One big example is backups. Netbackup and most others > produce one stream with all data even when backing up to disk. Imagine you > move a whole lot of data from one dir to another. Think a directory with huge > video files. As a filesystem it would be de-duped nicely. The backup stream > however may and may not have matching fs blocks. If the directory name before > and after has the same lengths and such - then yeah, dedupe works. Directory > name is a byte shorter? Everything in the stream will be offset by one byte - > and no dedupe will occur at all on the whole dataset. In real world just > compare the dedupe performance of an Oracle 7000 (zfs and therefore fs block > based) to a DataDomain (variable lenght) in this usage scenario. Among our > customers we see something like 3 to 17x dedupe ration on the DD, 1.02 - 1.05 > in the 7000.
What is the smallest granularity that the datadomain searches for in terms of dedup? Josef's current setup isn't restricted to a specific block size, but there is a min match of 4k. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html