On Apr 14, 2008, at 11:20 AM, Tino Schwarze wrote: > > Of > course, you shouldn't underestimate the cost of managing a lot of > small > files (my pool has about 5 million files, some of them are pretty > large), so the pool will have even more files which means more seeking > and looking up file blocks. > > Introducing file chunking would introduce a new abstraction layer - a > file would need to be split into chunks and recreated for restore. You
Tino -- thanks for posting this. These issues are exactly what I had in mind when I posted about adding sub-file deduplication. There's a lot more work to do and definitely a bunch more housekeeping. Right now, BackupPC gets off "easy" by utilizing hardlinks to do the dedupe. Once we delve below the file, a brand new data structure/ mechanism needs to be designed and built to efficiently link all of these blocks together. If you look at the commercial solutions that provide this functionality exclusively in software (as opposed to appliance-based solutions), you see that it is quite processor intensive. If there are flaws in the design of the mechanism to track the chunks, you will most definitely see pain in the backup and restore processes compared to the existing mechanism of deduping at the file level. -- Michael Barrow michael at michaelbarrow dot name ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ BackupPC-users mailing list [email protected] List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
