Jeffrey J. Kosowsky wrote at about 21:52:57 -0500 on Wednesday, March 2, 2011:
 > Craig Barratt wrote at about 00:32:57 -0800 on Wednesday, March 2, 2011:
 > As per my earlier post, I worry about the concept of still having
 > potential collisions (even if rare) and chains. The reason being both
 > that it is inelegant and that presumably it requires pool files to be
 > decompressed and compared byte-by-byte with any new file to check if
 > there is a collision. For large files, decompressing the entire file
 > and then comparing byte-by-byte is *slow* and could account for a
 > significant portion of the backup time.
 > 
 > Wouldn't it be better to pick a more secure checksum such as sha256sum
 > or even sha512sum where the chances of a collision are so
 > astronomically small as to be less likely than having a bit error in
 > your RAM. No collisions have ever been found for either of them and
 > they allow maximum file sizes up to 2^64-1 and 2^128-1
 > respectively. Presumably, the chance of a random collision is 2^-256
 > and 2^-512 respectively which are numbers so small that physical
 > hardware errors are more likely.
 > 
 > Eliminating any statistical likelihood of collision would then allow
 > you to simplify the code by eliminating the need for chains while also
 > speeding up adding files to the pool since you wouldn't need to check
 > for new collisions but just use the sha-sum.

Just as a follow-up, in case any of you are worried about the 1 in
2^512 (=10^222) chance of a collision, then I offer you these
consolations: 
1. The total number of particles in the universe is estimated to be
   between 10^72 and 10^87
2. The fame and fortune that you will gain from being the first to
   find a SHA-256 or SHA-512 collision will surely outweigh the cost
   of any data lost in backuppc due to such collision.

Plus, should collisions ever be found, I'm sure the NSA will be there
ready and waiting with SHA-1024 and beyond which could easily be
swapped in assuming that Craig writes the code in his usual modular
fashion...

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
BackupPC-devel mailing list
BackupPC-devel@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to