I think venti could deal with it: Rwrite returns a score, Tread provides a
score, and the caller typically uses it as an opaque value. If not, whether
a different sha1 is returned or a new algorithm is used, the caller could
still not rely on sha1(block)=score.

In any case, fossil needs a fix to cope with venti returning "score
collision", to prevent it failing to archive once it hits a shattered file,
or rather the first venti-sized block of them.

On Mon, 27 Feb 2017, 21:37 Riddler, <riddler...@gmail.com> wrote:

> I think much in the same vein as git, venti doesn't need to worry too
> much about collisions given the behavior when collisions occur is
> well-defined and sensible in both systems.
> It's second-preimage's that are more of a concern (and still not
> possible with SHA1). The lack of preimage attacks on SHA1 prevents
> people from maliciously creating a file with the same hash as one you
> created. They can only duplicate ones they created which should limit
> the scope of any maliciousness to stuff they have control over.
> At the point preimages are practical, I'd want to be long gone from
> SHA1 but IIRC even MD5 still has no practical second-preimage attacks
> so we're probably a long way off from there.
>
> Technically, anything relying on venti should handle the collision
> detected response gracefully, as it's always a possibility no matter
> the algorithm.
> If fossil doesn't handle it very well perhaps it's not venti that
> needs changed (given it detects & reports) but fossil.
> A top-of-the-head suggestion would be for fossil to respond to the
> collision notice by doing something to the block that can be undone
> later (as others above have hinted at) such as appending something,
> XOR, etc., marking it as such in its own data structures then passing
> it back to venti. It could then reverse the operation when retrieving
> the files with the 'collision fixed' flag set.
> I don't know how feasible that idea is (been a while since I looked at
> fossil) but worth looking into maybe? It would seem, at a cursory
> glance, fix the problem for fossil+venti indefinitely at the cost of a
> minor computational overhead for retrieving collided files.
>
> As Charles pointed out, you could also just do that in venti, I guess
> it depends if the write API call contract in venti is "returns SHA1 of
> file" or "returns arbitrary file id".
> If the behavior was put into venti you couldn't assume the ID returned
> = sha1(block) anymore - but I don't know if anything relies on that
> behavior.
> As for venti, I wouldn't say 'no point' to an algorithm update, but
> I'd rather have fossil updated to manage to deal with collisions
> better first.
>
>
> On Mon, Feb 27, 2017 at 8:14 PM, Bakul Shah <ba...@bitblocks.com> wrote:
> > On Mon, 27 Feb 2017 19:02:29 GMT Charles Forsyth <
> charles.fors...@gmail.com> wrote:
> >> On 27 February 2017 at 18:30, Charles Forsyth <
> charles.fors...@gmail.com>
> >> wrote:
> >>
> >> > that's a separate argument that venti would never work for you,
> regardless
> >> > of the hash algorithm used.
> >
> >> since venti returns the resulting score from each write, and it knows
> >> whether there's been a collision,
> >> it appears it could return a modified score (having ensured that is now
> >> unique, "and the next judge said that's a very shaggy dog")
> >
> > Consider what can happens you want to consolidate two venti
> > archives into another one. Each source venti has a different
> > file with the same hash. When you discover in the destination
> > venti that they collide, it is too late to return a modified
> > score -- you have to find and fix all pointer blocks that
> > refer to this block as well.
> >
> > In theory the  chance of a random collion with SHA1 may be
> > 1 in 2^80 but we have existing files that collide (unlike the
> > hypothetical argument of someone wanting to store 10^21 byte
> > size files -- but if they can produce it, we can store it!).
> > Your argument is that since venti is readonly, existing data
> > in it is not vulnerable but not everyone stores their archives
> > on readonly medium.  Another argument would be that almost
> > always venti is privately used and unlikely to be accessible
> > to the badguys.  Yet another argument is that hardly anyone
> > uses venti so why even bother. These are behavior patterns
> > that are true today but why limit its usefulness?
> >
> > Just as we move archived data we care about to more modern
> > media (as we no longer have easy access to floppies, 9track
> > tapes, 1.4" streamer tape etc.), and update our crypto keys,
> > since they too have limited shelf-life, we can replace the use
> > of SHA1.  This is a fixable problem.  [It is much much worse
> > for git given the amount of s/w that relies on it. I think
> > it is a matter of time before someone comes up with a
> > collision between two different types of git objects (such as
> > a blob and a tree) but we'll let Linus worry about it :-)]
> >
> > The solution is to convert from sha1 to blake2b or something
> > strong and be prepared to move the data again in 10-20 years.
> >
>
>

Reply via email to