Hello, folks! It's been about a year since the last update in this thread. I was reminded of it because I saw a well-intentioned project using "md5sum" to let its users identify a specific distribution of its software. Of course, md5sum is not secure for that!
Since md5sum is vulnerable to "collision attacks", it can't be used to safely identify one specific file. Instead, what it can do is identify that "this file is one of the files that the creator of this file used when they created this hash". That's different. For example, if the computer on which the software were packaged was backdoored, it could potentially generate multiple tarballs, with the same md5sum, but one of the tarballs containing backdoors and other being clean. Then whenever someone wanted to inspect the tarballs for backdoors, the attacker could provide the clean one for inspection, and whenever a user wanted to download the software, the attacker could provide the backdoored one. Almost all users don't understand that md5sum can't protect them against this, so in this scenario they would check the md5sum, it would match, and they would proceed to use the backdoored software. If people instead used a strong secure hash function like sha256sum or b2sum, then this hypothetical attacker would not be able to generate multiple packages matching the same hash. The attacker would have to decide whether to distribute the backdoored package, both to the inspectors and to the users, or to distribute the clean software, both to the inspectors and to the users. If they distributed different packages to different people, then someone would be receiving a package which did not match the hash. A lot of people fail to understand this subtle difference, but it is really important. A collision-vulnerable hash like md5sum doesn't make a hash that matches _only one specific file_. Instead it makes a hash that matches _a set of files chosen by the creator of the hash_. A collision-resistant hash like sha256sum or b2sum makes a hash that matches _only one specific file_. Okay, so that's why I care about this and why we all agreed in principle more than a year ago that replacing md5sum with something better was consistent with the GNU project's mission of helping protect users from being harmed through their networked software. Now I'm coming back to this thread because openssl-1.1.0 is now the stable release of openssl, and it comes with BLAKE2b! Now it should be a very simple patch to add BLAKE2b to: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/md5sum.c Unlike the slightly more involved patch that we were talking about earlier, of copying BLAKE2b implementation into coreutils tree. Note: I really think it is important that the resulting executable be named "b2sum". That is easy to remember and spell, and it is what other implementations of the same hash function are called. If the goal is to displace md5sum to protect people, there are two things we _must not_ do: 1. Offer a replacement which is slower, e.g. sha256sum. It has been offered for at least a decade and is hardly displacing md5sum at all. I think we need to offer a replacement which is faster. 2. Offer a replacement which is less usable, e.g. "cksum -a blake2b". The added effort of remembering that, and the difficulty of casually mentioning it to someone else in passing, will inhibit adoption, and prevent people who already use "b2sum" on other systems from realizing that GNU coreutils contains a compatible implementation. So, let's go ahead and offer a replacement which is safe, faster, and just as usable! :-) Sincerely, Zooko On Mon, Oct 12, 2015 at 12:27 AM, Pádraig Brady <[email protected]> wrote: > On 11/10/15 17:59, Zooko Wilcox-OHearn wrote: >> Folks: >> >> Earlier in this discussion Pádraig Brady asked ¹ if we had submitted >> BLAKE2 for inclusion in openssl. We have ², but they haven't yet >> included it ³. >> >> Eventually, I think, openssl will support highly optimized >> implementations of BLAKE2, but I think it will be a long time before >> that is widely deployed and GNU coreutils can rely on it. >> >> In the meantime, could we go ahead and use the portable C reference >> implementation ⁴, or the even smaller RFC implementation ⁵? >> >> Here is the code for the "b2sum" command-line tool that comes with the >> reference implementation: ⁶ >> >> Regards, >> >> Zooko >> >> ¹ http://lists.gnu.org/archive/html/coreutils/2015-06/msg00011.html >> ² https://mta.openssl.org/pipermail/openssl-dev/2015-June/001688.html >> ³ >> http://article.gmane.org/gmane.comp.encryption.openssl.devel/30514/match=blake2 >> ⁴ https://github.com/BLAKE2/BLAKE2/blob/master/ref/blake2b-ref.c >> ⁵ https://github.com/mjosaarinen/blake2_mjosref/blob/master/blake2b.c >> ⁶ https://github.com/BLAKE2/BLAKE2/blob/master/b2sum/b2sum.c > > Yes this is still on the cards for inclusion. > Thanks for summarizing the links. > > RMS has stated in a previous thread that CC0 code is compat with the GPL, > and that we don't need copyright assignment for copies of external libs. > With that in mind we'll look at adding b2 possibly to gnulib, > but available in coreutils in any case. > > thanks, > Pádraig. -- Regards, Zooko Wilcox Founder and CEO https://LeastAuthority.com — Freedom matters.
