On Mon, 7 Jan 2002, Binand Raj S. wrote:
> PS: Did you mean 'optimal' when you said correct up there? Otherwise
> why isn't the 4000 character version more correct, so to speak?
Both, the 4000 and 7000 byte versions, are correct. Friedl spends about
five pages deriving the 4000 byte version. Then he shows the 7000 byte
version, which basically takes care of speeding up some matches.
eg:
"([^"\\]|\\.)*"
and
"[^"\\]*(\\.|[^"\\])*"
both match a double quoted string with backslash escaped characters
within. The second case is faster.
This is so because alternation slows down a NFA engine by an order
of magnitude (since it has to check all alternatives).
By putting a [^"\\] first, we ensure that as large a stream of unescaped
characters are slurped before the alternation is reached. At that
point, the alternation takes over completely. Since escaped characters
are the exception rather than the rule in a dquoted string, this works
well.
Philip
--
I think Smithers picked me because of my motivational skills.
-- Homer Simpson
Homer the Smithers
_______________________________________________
linux-india-help mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/linux-india-help