Dear coreutils hackers: I think we should protect users by offering a replacement for md5sum. md5sum is dangerous to users, because it is collision-resistant whenever the data being processed is free of malicious input, but it is vulnerable to collisions (and other weirder patterns) if the data has been supplied or tampered with maliciously.
Whether any given user could be harmed by these weaknesses depends on the details of how they use md5sum. Experience has shown that the use of exploitable hash functions like MD5 has repeatedly resulted in users being vulnerable to attackers, even when the engineers who chose MD5 didn't think they would. I'm writing to this list because at the GNU 30th anniversary party, I saw RMS announce that protecting users from being harmed through their use of their networked software was henceforth a priority for the GNU project. I was happy to hear that, as I share that priority. Unfortunately, md5sum continues to be far and away the most widely used tool for hashing data. Let's provide a better replacement that people will actually switch to! In my opinion, there are three things that are required for a candidate to actually replace md5sum in practice: 1. It must be *faster*, or at least not slower, than md5sum. We've accomplished this with the invention of the BLAKE2 hashing algorithm, which is faster than MD5 on most systems and most input sizes. (While being truly secure — it is secure against all of the security issues that plague MD5.) 2. It must be the singular de facto standard. It we tell people "Stop using MD5, and start using BLAKE2, SHA3, or Skein — your choice." then then will continue using MD5. If instead we tell them "Stop using MD5 and start using BLAKE2.", then there is a chance they will actually take our advice. (There are two sound, rational reasons for this. The first is that they don't want to pay the costs of learning about the options in order to make a decision, and they know that if instead they simply delay switching then there will eventually be a de facto standard that they can switch to without choosing among alternatives. The second is that they want their hashes to be compatible with other people's hashes and other people's toolsets, so they want to choose whatever algorithm everyone else is going to choose.) 3. It must be just as easy to spell, remember, say, and type as "md5sum". If we tell people "Stop using md5sum and start using 'cksum -a BLAKE2b'." then they will keep using md5sum. If we tell them "Stop using md5sum and start using b2sum." then there is a chance they'll take our advice. Thank you for your kind attention! If you would like to contribute to provide this feature please let me know! Regards, Zooko
