Am Dienstag, 28. April 2009 22:16:19 schrieb Thomas Glanzmann: > Hello Heinz, > > > It's not only cpu time, it's also memory. You need 32 byte for each 4k > > block. It needs to be in RAM for performance reason. > > exactly and that is not going to scale. > > Thomas
Hi Thomas, I wrote a backup tool which uses dedup, so I know a little bit about the problem and the performance impact if the checksums are not in memory (optionally in that tool). http://savannah.gnu.org/projects/storebackup Dedup really helps a lot - I think more than I could imagine before I was engaged in this kind of backup. You will not beleve how many identical files are in a filesystem to give a simple example. EMC has very big boxes for this with lots of RAM in it. I think the first problem which has to be solved is the memory problem. Perhaps something asynchronous to find identical blocks and storing the checksums on disk? Heinz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html