Re: [BackupPC-users] improving the deduplication ratio

Michael Barrow Mon, 14 Apr 2008 14:31:47 -0700

On Apr 14, 2008, at 11:20 AM, Tino Schwarze wrote:
>
> Of
> course, you shouldn't underestimate the cost of managing a lot of  
> small
> files (my pool has about 5 million files, some of them are pretty
> large), so the pool will have even more files which means more seeking
> and looking up file blocks.
>
> Introducing file chunking would introduce a new abstraction layer - a
> file would need to be split into chunks and recreated for restore. You



Tino -- thanks for posting this. These issues are exactly what I had  
in mind when I posted about adding sub-file deduplication. There's a  
lot more work to do and definitely a bunch more housekeeping. Right  
now, BackupPC gets off "easy" by utilizing hardlinks to do the  
dedupe. Once we delve below the file, a brand new data structure/ 
mechanism needs to be designed and built to efficiently link all of  
these blocks together.

If you look at the commercial solutions that provide this  
functionality exclusively in software (as opposed to appliance-based  
solutions), you see that it is quite processor intensive. If there  
are flaws in the design of the mechanism to track the chunks, you  
will most definitely see pain in the backup and restore processes  
compared to the existing mechanism of deduping at the file level.


--
Michael Barrow
michael at michaelbarrow dot name




-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Re: [BackupPC-users] improving the deduplication ratio

Reply via email to