On Mon, 28 Jan 2013 19:46:13 +0100, Richard Hipp <[email protected]> wrote:

I think another point is that the Lua regexp does not do anchoring (or at
least I didn't see it - did I miss something?)

see also here (from http://www.lua.org/pil/20.4.html):

Usually, pattern matching is efficient enough for Lua programs: A Pentium 333MHz (which is not a fast machine by today's standards) takes less than a tenth of a second to match all words in a text with 200K characters (30K words). But you can take precautions. You should always make the pattern as specific as possible; loose patterns are slower than specific ones. An extreme example is '(.-)%$', to get all text in a string up to the first dollar sign. If the subject string has a dollar sign, everything goes fine; but suppose that the string does not contain any dollar signs. The algorithm will first try to match the pattern starting at the first position of the string. It will go through all the string, looking for a dollar. When the string ends, the pattern fails for the first position of the string. Then, the algorithm will do the whole search again, starting at the second position of the string, only to discover that the pattern does not match there, too; and so on. This will take a quadratic time, which results in more than three hours in a Pentium 333MHz for a string with 200K characters. You can correct this problem simply by anchoring the pattern at the first position of the string, with '^(.-)%$'. The anchor tells the algorithm to stop the search if it cannot find a match at the first position. With the anchor, the pattern runs in less than a tenth of a second.

Beware also of empty patterns, that is, patterns that match the empty string. For instance, if you try to match names with a pattern like '%a*', you will find names everywhere:
    i, j = string.find(";$%  **#$hello13", "%a*")
    print(i,j)   --> 1  0
In this example, the call to string.find has correctly found an empty sequence of letters at the beginning of the string.

It never makes sense to write a pattern that begins or ends with the modifier `-´, because it will match only the empty string. This modifier always needs something around it, to anchor its expansion. Similarly, a pattern that includes '.*' is tricky, because this construction can expand much more than you intended.


--
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to