Hi Scott, Thank you for your reply!
On 10/29/2015 01:40 AM, Scott Robison wrote: > On Wed, Oct 28, 2015 at 6:37 PM, Eduard <[email protected] > <mailto:[email protected]>> wrote: > > If fossil didn't say it used SHA1 to generate artifact IDs, I don't > think anyone would care how it generated IDs. > > An artifact ID is a way of assigning a fixed length identifier to an > artifact with good distribution of IDs in the fixed length space > provided. It is not intended to be a cryptographic. > (...) > In this case, it's just an identifier, and the odds of a > non-malicious collision are so close to zero that those odds might as > well be zero. That's the thing, an artifact ID is *not* just an identifier! I don't know whether this is an intentional feature or not, but a Fossil repository is structured as a Merkle tree (or a "hash tree"). If I know that a single commit is genuine (because I wrote it down on a piece of paper or because it is PGP signed), then that guarantees that all of its ancestors (and all of their files) have not been tampered with. This is a very powerful property, assuming that the hash function is cryptographically secure. In contrast, suppose that instead artifacts were identified by their CRC160 (like CRC32, but longer) or by some sort of GUID. Accidental collisions would be extremely unlikely, but intentional collisions or preimages would be trivial to forge. Knowing that a particular commit manifest is genuine would be pointless; anyone could make up a different ancestor tree that matches the parent ID in the manifest. Worse yet, that would not even guarantee the 'goodness' of the files in the commit (anyone would be able to make up a different file with the same artifact ID). > You can't create a collision in advance based on not knowing who is > going to commit what to the repository in advance. > > Let's say you do, after the fact, manage to create a collision. If you > try to upload it to the repository it will be ignored because fossil > believes (correctly) it already has the artifact in question. That's not a collision, that's a second-preimage. My point is rather that an attacker can serve the clean version of the file to the auditors, and the malicious version to the users (who trust the auditors to tell them whether a commit is okay to use). This doesn't require a second-preimage attack against the hash, only the ability to generate collisions (which is possible right now, but expensive). > As you observe, one could in theory mount a MITM attack. At this point > what is to stop them from serving a completely alien repository that > they've specially crafted? No collisions required. It would then be obvious that the repository is completely alien since none of the artifact IDs match. If the user knows at least one top-level genuine artifact ID (for example, because they trust a developer and that developer signs their commits), then that guarantees the integrity of all the ancestors. > In fact, the "easiest" way to getting people to use malicious software > is to host a compromised repository and convince people to use it > instead of the "blessed" repository. I agree, that's the way it usually works in practice. But no amount of checking (save for re-hashing every single artifact in a different database and signing *that*) can protect against SHA1 collision (or second-preimage) attacks. > If you want to change the way fossil does things to limit the > possibility of fraudulent artifacts, that's fine. Perhaps prefixing the > blob data with a length (ala git) might help mitigate the possibility of > hash collisions. Perhaps creating a hash of the complete commit (vs just > the manifest) and storing it in the manifest might help. If the hash function itself is broken, neither of those would work. The first fix would only work if SHA1 collisions are only possible with different-length inputs (from what I understand, they're usually on equal-length inputs), and the second fix is essentially the R-card (which has been deprecated for efficiency reasons, and because it is redundant with the files' artifact IDs which (are supposed to) guarantee their contents' genuineness). > Ultimately, one can chase hash algorithms forever trying to create some > ultimately secure ideal. In the case of actual security software, I can > see the point. A DVCS *is* a security software. It is perhaps the most important security software, since the security and integrity of all other software ultimately depends on it. The integrity of the sqlite repository (which is probably used by over a billion people right now) depends on it. > one can chase hash algorithms forever I know that it may appear so (I thought so too for a long time), but it's not a question of choosing better hash algorithms forever either. >From the very beginning SHA1 had a very short digest (n=160 bits), so the most security it could ever have against a collision attack is n/2=80 bits (i.e. that's assuming that it is a perfect hash function, which we now know it isn't). It stands to reason that any significant reduction in collision security from that point would bring an attack into the realm of feasibility (just 30 bits would be enough to utterly break it). Suppose that instead we chose a 512-bit hash. Its maximum collision security is 256 bits. Suppose that some cryptographic attack eventually manages to downgrade that to 196 bits (i.e. a 60 bit reduction in security, which is huge and extremely unlikely). That means it would take 2^196 operations to generate a collision. Could technological advancement maybe make that possible someday? I'll just quote Bruce Schneier's calculation: -- begin quote -- One of the consequences of the second law of thermodynamics is that a certain amount of energy is necessary to represent information. To record a single bit by changing the state of a system requires an amount of energy no less than kT, where T is the absolute temperature of the system and k is the Boltzman constant. (Stick with me; the physics lesson is almost over.) Given that k = 1.38×10^-16 erg/°Kelvin, and that the ambient temperature of the universe is 3.2°Kelvin, an ideal computer running at 3.2°K would consume 4.4×10^-16 ergs every time it set or cleared a bit. To run a computer any colder than the cosmic background radiation would require extra energy to run a heat pump. Now, the annual energy output of our sun is about 1.21×10^41 ergs. This is enough to power about 2.7×10^56 single bit changes on our ideal computer; enough state changes to put a 187-bit counter through all its values. If we built a Dyson sphere around the sun and captured all its energy for 32 years, without any loss, we could power a computer to count up to 2^192. Of course, it wouldn't have the energy left over to perform any useful calculations with this counter. But that's just one star, and a measly one at that. A typical supernova releases something like 10^51 ergs. (About a hundred times as much energy would be released in the form of neutrinos, but let them go for now.) If all of this energy could be channeled into a single orgy of computation, a 219-bit counter could be cycled through all of its states. These numbers have nothing to do with the technology of the devices; they are the maximums that thermodynamics will allow. And they strongly imply that brute-force attacks against 256-bit keys will be infeasible until computers are built from something other than matter and occupy something other than space. -- end quote -- Best, Eduard _______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

