On Mon, 7 Jan 2002, Binand Raj S. wrote:

> PS: Did you mean 'optimal' when you said correct up there? Otherwise
> why isn't the 4000 character version more correct, so to speak?

Both, the 4000 and 7000 byte versions, are correct.  Friedl spends about 
five pages deriving the 4000 byte version.  Then he shows the 7000 byte 
version, which basically takes care of speeding up some matches.

eg: 
   "([^"\\]|\\.)*"

       and

   "[^"\\]*(\\.|[^"\\])*"

both match a double quoted string with backslash escaped characters 
within.  The second case is faster.

This is so because alternation slows down a NFA engine by an order 
of magnitude (since it has to check all alternatives).

By putting a [^"\\] first, we ensure that as large a stream of unescaped
characters are slurped before the alternation is reached.  At that 
point, the alternation takes over completely.  Since escaped characters 
are the exception rather than the rule in a dquoted string, this works 
well.

Philip

-- 
I think Smithers picked me because of my motivational skills.

                -- Homer Simpson
                   Homer the Smithers


_______________________________________________
linux-india-help mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/linux-india-help

Reply via email to