Extensive top-post trail deleted.
On 10/05/2011 02:39 AM, Daniel Sparrman wrote:
As with the hash conflict, the DD uses SHA-1 with a variable block length for deduplication. Theoretically, there is a 2^160 chance it will happen. Doesnt seem to be that bad, but your first hash collision is randomly more likely to happen than that number suggests.
I agree with your technical analysis, and I feel your disquiet. Waay back in the '80s, I brought a (8mm :) tape to a meeting with a dept official to say "One chance in a billion means to me that there are five broken files on this tape".. The topic then was "should we make copies of these?" But I feel that you express these numbers in a vacuum which misleads. The appropriate judgement has to be, not "Is an error possible?", but "How risky is this?"; and that risk has to be compared to the other risks you're taking. I feel that you are focused on the unpredictably large impact of a collision. "All my backups are gone!" is emotionally accessible to any of us, and makes me shudder. But that scenario is not a plausible result of a hash collision. Not that the reality is peachy: "Some difficult-to identify set of my files are now corrupt" is quite bad enough, thank you. A 1/10^30 risk just doesn't have the same emotional availability. But the homeopathic chances of it happening ought to temper the resistance. I would invoke the analogy of driving your car across the country vs. taking an airplane; Many are paralyzed by the risks of air travel, when the actuaries will tell you with great precision that you've a better chance of dying in the drive _to the airport_ than once you've taken off. Similarly, I'd guess that more DD failures have happened due to physical violence than due to hash collisions. - Allen S. Rout
