On Mon, 28 Jan 2013 19:46:13 +0100, Richard Hipp <[email protected]> wrote:
I think another point is that the Lua regexp does not do anchoring (or at
least I didn't see it - did I miss something?)
see also here (from http://www.lua.org/pil/20.4.html):
Usually, pattern matching is efficient enough for Lua programs: A Pentium
333MHz (which is not a fast machine by today's standards) takes less than
a tenth of a second to match all words in a text with 200K characters (30K
words). But you can take precautions. You should always make the pattern
as specific as possible; loose patterns are slower than specific ones. An
extreme example is '(.-)%$', to get all text in a string up to the first
dollar sign. If the subject string has a dollar sign, everything goes
fine; but suppose that the string does not contain any dollar signs. The
algorithm will first try to match the pattern starting at the first
position of the string. It will go through all the string, looking for a
dollar. When the string ends, the pattern fails for the first position of
the string. Then, the algorithm will do the whole search again, starting
at the second position of the string, only to discover that the pattern
does not match there, too; and so on. This will take a quadratic time,
which results in more than three hours in a Pentium 333MHz for a string
with 200K characters. You can correct this problem simply by anchoring the
pattern at the first position of the string, with '^(.-)%$'. The anchor
tells the algorithm to stop the search if it cannot find a match at the
first position. With the anchor, the pattern runs in less than a tenth of
a second.
Beware also of empty patterns, that is, patterns that match the empty
string. For instance, if you try to match names with a pattern like '%a*',
you will find names everywhere:
i, j = string.find(";$% **#$hello13", "%a*")
print(i,j) --> 1 0
In this example, the call to string.find has correctly found an empty
sequence of letters at the beginning of the string.
It never makes sense to write a pattern that begins or ends with the
modifier `-´, because it will match only the empty string. This modifier
always needs something around it, to anchor its expansion. Similarly, a
pattern that includes '.*' is tricky, because this construction can expand
much more than you intended.
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users