Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Daniel Shahaf Mon, 06 Feb 2023 15:51:15 -0800

Karl Fogel wrote on Mon, Jan 30, 2023 at 17:26:03 -0600:
> On 29 Jan 2023, Evgeny Kotkov via dev wrote:
> > I have *absolutely* no idea where "being railroaded through" comes
> > from.  Really, it's a wrong way of portraying and thinking about the
> > events that have happened so far.
> > 
> > Reiterating over those events: I wrote an email containing my
> > thoughts and explaining the motivation for such change.  I didn't
> > reply to some of the questions (including some tricky questions,
> > such as the one featuring a theoretical hash function), because they
> > have been at least partly answered by others in the thread, and I
> > didn't have anything valuable to add at that time.
> > 
> > During that time, I was actively coding the core part of the change,
> > to check if it's possible technically.  Which is important, as far
> > as I believe, because not all theoretically possible solutions can
> > be implemented without facing significant practical or
> > implementation-related issues, and it seems to me that you
> > significantly undervalue such an approach.
> > 
> > I do not say my actions were exemplary, but as far as I can tell,
> > they're pretty much in line with how svn-dev has been operating so
> > far. But, it all resulted in an unclear veto without any _technical_
> > arguments, where what's being vetoed is unclear as well, because the
> > change was not ready at the moment veto got casted.
> > 
> > And because your veto goes in favor of a specific process
> > (considering that no other arguments were given), the only thing
> > that's *actually* being railroaded is an odd form of an RTC
> > (review-then-commit) process that is against our usual CTR
> > (commit-then-review) [1,2].  That's railroading, because it hasn't
> > been explicitly discussed anywhere and a consensus on it has not
> > been reached.
> 
> Daniel, given what's in Evgeny's branch now, could you summarize your
> current technical objections if any?
> 
> If they are something like "This code is solving the wrong problem(s)" or
> "I'm not sure what problem(s) it's supposed to solve", those count as
> technical objections.  It's just that it would be useful to have the
> objection(s) gathered in one place. This thread has been long and somewhat
> digressive -- I'm not saying that's due to you -- and I at least have found
> it a bit difficult to keep track of the concrete objections versus various
> interesting but ultimately theoretical points.
>


Quoting my other reply just now:

    […] it's pretty simple.  [The OP] said "We should do Y because it
    addresses X".  [The OP] didn't explain why X needs to be addressed, didn't
    consider what alternatives there are to Y, didn't consider any cons that
    Y may have… and when people had questions, [the OP] just began to
    implement Y, without responding to or even acknowledging those
    questions.
    
    That's not how design discussions work.  A design discussion doesn't go
    "state decision; state pros; implement"; it goes "state problem; discuss
    potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).
    
    That's why I called veto: not because I considered any particular
    proposal then on the table unreasonable, but because I considered /the
    decision process being used/ unreasonable (cf. [7]).

Concretely: Why would migrating away from SHA-1 be a good thing in the
first place?  Assuming that it /would/ be a good thing, what alternative
ways are there to achieve whatever the goodness may be (new feature /
bugfix / resilience to some attack vector / etc.)?  What are the
potential *downsides* of migrating away from SHA-1?

The same, restated at a higher level of abstraction: "Migrate
away from SHA-1" is a means, not an end.  Define the ends and have
a non-predetermined-outcome discussion on how to achieve them.

"Reduce the security impact to our users of second-preimage attacks
against SHA-1" would be an end.  I don't know whether it's the only one
or whether there are additional ones.

[As to the branch, I'm not sure whether to restate my position on it or
not — so I'll restate it, erring on the side of including too much
rather than too little, but feel free to ignore the following paragraph
at will:]

Was the branch commenced as a PoC / smoke test, to explore one proposed
direction and to be discarded if the consensus compass should end up
pointing towards another cardinal direction?  Or was it commenced on the
assumption that consensus on migrating to SHA-1 to SHA-256 went without
saying, had already formed, or would necessarily have formed by 1.15.0-rc1?

> The reason I'm supportive of Evgeny's direction is that his changes, if
> completed, would offer a solution to the (admittedly still somewhat distant)
> security concern I raised early on. Essentially, I'm worried that
> second-preimage attacks on SHA-1 are coming eventually (maybe I'm wrong
> about this -- they are after all significantly harder than mere collision
> attacks).  *If* such attacks become possible, then our WC could report a
> file as unmodified when in fact it is modified, which would have real
> security implications, as I outlined.
> 

I take it you're referring to this:

    
https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C87k02dr4mn.fsf%40red-bean.com%3E
    I have put WordPress installations under Subversion version control before.
    Once, I detected an attack on one of those WordPress servers when one of the
    things the attacker did was modify some of the WordPress scripts on the
    server.  Those files showed up as modified when I ran 'svn st', and from
    there I ran 'svn diff' and figured out what had happened.  But a
    super-careful attacker could make modifications that leave the
    version-controlled files with the same SHA1 hash they had before, thus
    making it harder to detect the attack.
    
    Yes, I realize there are other ways to detect modifications, and that random
    attackers are unlikely to take the trouble to preserve hashes.  On the other
    hand, a well-resourced spear-fishing attacker who knows something about the
    usage of SVN at their target might indeed try a hash-preserving approach to
    breaking in. The point is, if we're counting on the hashes having certain
    semantics, then our users are counting on it too.  If SHA1 no longer has
    those semantics, we should upgrade.

I offered one alternative counter to that here:

    
https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3Cadacbb6f-e0cb-4e5b-8603-0eda19f93b3c%40app.fastmail.com%3E
    So, suppose the wc didn't hardcode _any particular_ hash function for
    naming pristines and for status walks — not md5, not sha1, not sha256 —
    but had each «svn checkout» run pick a hash function uniformly at random
    out of a large enough family of hash functions[1].  (Intuitively, think
    of a family of hash functions as a hash function with a random salt,
    similar to [2].)
    
    This way, even if someone tried to deliberately create a collision, they
    wouldn't be able to pick a collision "off the shelf", as with
    shattered.io; they'd need to compute a collision for the specific hash
    function ("salt") used by that particular wc.  That's more difficult than
    creating a collision in a well-known hash function, regardless of
    whether we treat the salt's value as a secret of the wc (as in, stored
    in a mode-0400 file in under .svn directory and not disclosed to the
    server) or as a value the attacker is assumed to know.
    
    So, that's one way to address [the WordPress scenario].

And analysed the marginal attack difficulty if we change the checksum
algorithm here:

    
https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230121102455.GB3174%40tarpaulin.shahaf.local2%3E
    For example, if we used another checksum algorithm, the attacker from
    your scenario might opt to edit the base checksums in .svn/wc.db and
    rename the .svn/pristine/ files accordingly.  That's much easier to pull
    off, and will be easy to adapt if we change the algorithm again, but on
    the other hand, requires write access to the .svn directory and is
    easier to discover.

In any case, even assuming second-preimage attacks against SHA-1 are
something we should assume adversaries capable of [and I'm not
expressing any opinion on this question], it does not /automatically/
follow that we should migrate away from SHA-1:

    
https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230121102455.GB3174%40tarpaulin.shahaf.local2%3E
    To be clear, is this what you're saying? —
    .
        Premise: There is a collision attack against SHA-1.
        Conclusion: Subversion should stop using SHA-1.

    This conclusion does not follow from this premise.  For instance, FSFS
    checks for collisions, so it can actually use "File length in bytes" as
    a checksum […]

And to be clear: I'm not saying Subversion should continue using SHA-1,
and I'm not saying that Subversion should stop using SHA-1.  I'm saying
we should consider what the alternatives to that are.

> Like I said, this is far from urgent, and IMHO it certainly should not delay
> a release of our new pristineless feature.  But when and if Evgeny's branch
> is ready (where "ready" presumably includes something other than salted
> SHA-1 as the other checksum option), I would like to see these changes go
> in, unless we identify some harm from them.
> 
> For everyone's ease of reference:
> 
> $ svn cat 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind/BRANCH-README
> 
> $ svn log --stop-on-copy
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind/
> 
> Best regards,
> -Karl

Thanks for allowing me the time to write a proper response :)

Daniel

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

Reply via email to