Jeffrey J. Kosowsky wrote at about 21:52:57 -0500 on Wednesday, March 2, 2011: > Craig Barratt wrote at about 00:32:57 -0800 on Wednesday, March 2, 2011: > As per my earlier post, I worry about the concept of still having > potential collisions (even if rare) and chains. The reason being both > that it is inelegant and that presumably it requires pool files to be > decompressed and compared byte-by-byte with any new file to check if > there is a collision. For large files, decompressing the entire file > and then comparing byte-by-byte is *slow* and could account for a > significant portion of the backup time. > > Wouldn't it be better to pick a more secure checksum such as sha256sum > or even sha512sum where the chances of a collision are so > astronomically small as to be less likely than having a bit error in > your RAM. No collisions have ever been found for either of them and > they allow maximum file sizes up to 2^64-1 and 2^128-1 > respectively. Presumably, the chance of a random collision is 2^-256 > and 2^-512 respectively which are numbers so small that physical > hardware errors are more likely. > > Eliminating any statistical likelihood of collision would then allow > you to simplify the code by eliminating the need for chains while also > speeding up adding files to the pool since you wouldn't need to check > for new collisions but just use the sha-sum.
Just as a follow-up, in case any of you are worried about the 1 in 2^512 (=10^222) chance of a collision, then I offer you these consolations: 1. The total number of particles in the universe is estimated to be between 10^72 and 10^87 2. The fame and fortune that you will gain from being the first to find a SHA-256 or SHA-512 collision will surely outweigh the cost of any data lost in backuppc due to such collision. Plus, should collisions ever be found, I'm sure the NSA will be there ready and waiting with SHA-1024 and beyond which could easily be swapped in assuming that Craig writes the code in his usual modular fashion... ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ BackupPC-devel mailing list BackupPC-devel@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-devel Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/