Hey Sashank,

> Imagining there are no back references. 
> For example , How do we fix simple evil regex's like  the one mentioned here 
> say (a+)+ 
> Does this simple regular expression match Regular grammar ? or CFG or CSG ? 
> Since they are causing ReDOS attacks , can we say they are CSG or above ?
> 

In my opinion there is no evil (backref-free) regex, there are only 
implementations that (may) suck. The given expression is regular in the formal 
sense because there are no backreferences. The evilness comes from the 
implementation of the matcher. Because it is regular, we know that there is a 
unique deterministic finite state automaton, but the number of states can be 
exponential in the length of the expression. This implies that the conversion 
from expression to matcher can take exponential time in the worst case. 
A backtracking implementation now safes time by skipping this conversion part; 
the expression is interpreted like a program and nondeterminism is shifted to 
runtime. A matcher starts of with a single thread datastructure in memory and 
with every input to the matcher, the thread moves along the expression from 
left to right. In the case of nondeterminism (*,+), the thread splits: the 
current thread datastructure becomes two with different progressions in the 
expression.
The given evil expression now splits with every input, therefore the number of 
thread datastructures in memory grows exponentially with every input and so 
does the time necessary to progress an input. This is your DoS! No CFG or CSG 
magic, just complexity.

Why do implementations backtrack instead of using nice automata? Because you 
need backtracking for backreferences in PCRE! Libs like google RE2 drop 
backreferences from PCRE, so all the cool automata-theoretic concepts work.

> Also how can we determine (programmatically) whether given a regex it's 
> equivalent grammar is CSG or above (thus undecidable) and can potentially 
> cause REDOS ?  

Backreferences. The moment you need them, you should step back and ask yourself 
why you need them. Either the expression can be rewritten to stay regular, or 
you should think more about the problem you want to solve and use an 
appropriate parser instead. 

Regards, Harald

_______________________________________________
langsec-discuss mailing list
[email protected]
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

Reply via email to