On 5 June 2012 20:33, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: > On 06/05/2012 08:02 PM, mark florisson wrote: >> >> On 5 June 2012 18:09, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> >> wrote: >>> >>> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote: >>>> >>>> >>>> On 06/05/2012 09:25 AM, Stefan Behnel wrote: >>>>> >>>>> >>>>> Dag Sverre Seljebotn, 04.06.2012 21:44: >>>>>> >>>>>> >>>>>> This can cause crashes/stack smashes >>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the >>>>>> probability is incredibly small, b) it would only matter in >>>>>> situations that should cause an AttributeError anyway, c) if we >>>>>> really care, we can always use an interning-like mechanism to >>>>>> validate on module loading that its hashes doesn't collide with >>>>>> other hashes (and raise an exception "Congratulations, you've >>>>>> discovered a phenomenal md5 collision, get in touch with cython >>>>>> devs and we'll work around it right away"). >>>>> >>>>> >>>>> >>>>> I'm not a big fan of such an attitude. If this happens at runtime, it >>>>> can >>>>> induce any cost from cheap-at-test-time to >>>>> hugely-expensive-in-production. >>>>> Thinking with my evil hat on, this can potentially be data triggered >>>>> from >>>>> the outside (e.g. if a JIT compiler is involved at one end), thus >>>>> possibly >>>>> even leading to a security hole. >>>>> >>>>> We should try to produce software that others can build a business on. >>>> >>>> >>>> >>>> Well, I'd build a business on something that fails with a 5e-7 >>>> probability any day :-) (given that you trust my estimates in the other >>>> post; I think they were rather conservative myself) >>> >>> >>> >>> This was put the wrong way. The chance was 5e-7 that it would fail for >>> anybody over the course of human history (and that was a rather >>> pessimistic >>> estimate). >>> >>> So a more "individual tack": >>> >>> Assume that the process contains 200 MB of method definitions alone, with >>> each method definition being a 8 character string. (That should mean the >>> executable should be several gigabytes :-)) >>> >>> That puts the probability of collision at 10^-34 for that process >>> containing >>> a 64-bit hash collision. >>> >>> >>> Dag >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel@python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> The point is not so much running into this problem accidentally, but >> maliciously. If user input from untrusted users can somehow determine >> the function signatures that are generated and called by a JIT, then a >> malicious user can find collisions offline and cause some fault in a >> valid user program. > > > This took me a while to understand. So the idea is that you're in a > completely managed environment (like Java), and you want to run untrusted > code and have it not segfault or smash the stack. Eve then cleverly > assembles a caller/callee pair with mismatching signatures but the same > hash. > > Yes, in that situation 64 bits is perhaps not enough. > > But is this relevant to what we're trying to do here? We're discussing APIs > to talk between Python C extension modules that already have unlimited > powers. I'd think a "managed Cython" would be such a large change that one > could easily change the hash size at that point? > > But I agree it's not as easily written off as I thought. > > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel
It doesn't even necessarily have to be about running user code, a user could craft data input which causes such a situation. For instance, let's say we have a just-in-time specializer which specializes a function for the runtime input types, and the types depend on the user input. For instance, if we write a web application we can post arrays to described by a custom dtype, which draws pictures in some weird way for us. We can get it to specialize pretty much any array type, so that gives us a good opportunity to find collisions. _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel