I suspect there are other forms of regex's too other than backlashes that can cause problem .
For example , the context sensitive grammar example given in the link here .<http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html>does not have back lashes ! Since the acceptance of CSG is UnDecidable this could cause attacks too according to langsec principles . Is my understanding correct ? Regards, Sashank http://lnkd.in/88sgfr On Sun, Oct 27, 2013 at 1:13 PM, Harald Lampesberger < [email protected]> wrote: > Hey Sashank, > > Imagining there are no back references. > > For example , How do we fix simple evil > regex's<https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS>like > the one mentioned here say (a+)+ > > Does this simple regular expression match Regular grammar ? or CFG or CSG > ? > Since they are causing ReDOS attacks , can we say they are CSG or above ? > > > In my opinion there is no evil (backref-free) regex, there are only > implementations that (may) suck. The given expression is regular in the > formal sense because there are no backreferences. The evilness comes from > the implementation of the matcher. Because it is regular, we know that > there is a unique deterministic finite state automaton, but the number of > states can be exponential in the length of the expression. This implies > that the conversion from expression to matcher can take exponential time in > the worst case. > A backtracking implementation now safes time by skipping this conversion > part; the expression is interpreted like a program and nondeterminism is > shifted to runtime. A matcher starts of with a single thread datastructure > in memory and with every input to the matcher, the thread moves along the > expression from left to right. In the case of nondeterminism (*,+), the > thread splits: the current thread datastructure becomes two with different > progressions in the expression. > The given evil expression now splits with every input, therefore the > number of thread datastructures in memory grows exponentially with every > input and so does the time necessary to progress an input. This is your > DoS! No CFG or CSG magic, just complexity. > > Why do implementations backtrack instead of using nice automata? Because > you need backtracking for backreferences in PCRE! Libs like google RE2 drop > backreferences from PCRE, so all the cool automata-theoretic concepts work. > > Also how can we determine (programmatically) whether given a regex it's > equivalent grammar is CSG or above (thus undecidable) and can potentially > cause REDOS ? > > > Backreferences. The moment you need them, you should step back and ask > yourself why you need them. Either the expression can be rewritten to stay > regular, or you should think more about the problem you want to solve and > use an appropriate parser instead. > > Regards, Harald > > > _______________________________________________ > langsec-discuss mailing list > [email protected] > https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss > >
_______________________________________________ langsec-discuss mailing list [email protected] https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
