On Wed, Mar 1, 2017 at 7:02 AM, FUJIWARA Katsunori <fo...@lares.dti.ne.jp> wrote: > # HG changeset patch > # User FUJIWARA Katsunori <fo...@lares.dti.ne.jp> > # Date 1488380487 -32400 > # Thu Mar 02 00:01:27 2017 +0900 > # Node ID 018d9759cb93f116007d4640341a82db6cf2d45c > # Parent 0bb3089fe73527c64f1afc40b86ecb8dfe7fd7aa > similar: allow similarity detection to use sha256 for digesting file contents > > Before this patch, similarity detection logic (used for addremove and > automv) uses SHA-1 digesting. But this cause incorrect rename > detection, if: > > - removing file A and adding file B occur at same committing, and > - SHA-1 hash values of file A and B are same
If rename detection in this very special case is all that's lost, I don't think the extra complexity (code and config) is worth it. > > This may prevent security experts from managing sample files for > SHAttered issue in Mercurial repository, for example. > > https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html > https://shattered.it/ > > Hash collision itself isn't so serious for core repository > functionality of Mercurial, described by mpm as below, though. > > https://www.mercurial-scm.org/wiki/mpm/SHA1 > > HOW ABOUT: > > - which should we use default algorithm SHA-1, SHA-256 or SHA-512 ? > > ease (handling problematic files safely by default) or > performance? > > - what name of config knob is reasonable to control digesting algorithm ? > > or should we fully switch to more secure digest alrgorithm ? Yes, I'd rather wait until we do that. > > BTW, almost all of SHA-1 clients in Mercurial source tree applies it > on other than file contents (e.g. list of parent hash ids, file path, > URL, and so on). But some SHA-1 clients below apply it on file > contents. > > - patch.trydiff() > > SHA-1 is applied on "blob SIZE\0" header + file content (only if > experimental.extendedheader.index is enabled) > > - largefiles > > The former should be less serious. > > On the other hand, the latter causes unintentional unification between > largefiles, which cause same SHA-1 hash value, at checking files out. > > diff --git a/mercurial/similar.py b/mercurial/similar.py > --- a/mercurial/similar.py > +++ b/mercurial/similar.py > @@ -12,9 +12,15 @@ import hashlib > from .i18n import _ > from . import ( > bdiff, > + error, > mdiff, > ) > > +DIGESTERS = { > + 'sha1': hashlib.sha1, > + 'sha256': hashlib.sha256, > +} > + > def _findexactmatches(repo, added, removed): > '''find renamed files that have no changes > > @@ -23,19 +29,25 @@ def _findexactmatches(repo, added, remov > ''' > numfiles = len(added) + len(removed) > > + digest = repo.ui.config('ui', 'similaritydigest', 'sha256') > + if digest not in DIGESTERS: > + raise error.ConfigError(_("ui.similaritydigest value is invalid: %s") > + % digest) > + digester = DIGESTERS[digest] > + > # Get hashes of removed files. > hashes = {} > for i, fctx in enumerate(removed): > repo.ui.progress(_('searching for exact renames'), i, total=numfiles, > unit=_('files')) > - h = hashlib.sha1(fctx.data()).digest() > + h = digester(fctx.data()).digest() > hashes[h] = fctx > > # For each added file, see if it corresponds to a removed file. > for i, fctx in enumerate(added): > repo.ui.progress(_('searching for exact renames'), i + len(removed), > total=numfiles, unit=_('files')) > - h = hashlib.sha1(fctx.data()).digest() > + h = digester(fctx.data()).digest() > if h in hashes: > yield (hashes[h], fctx) > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel