Hi

Another micro-optimization which I can think of: when you have
patterns with alternatives like <token regexp="yes">vor|nach|in</token>
I suppose that it's faster if we put the most frequent words
first in the alternative list.  So when it matches, it will match
after doing less backtracking. When it does not match, the
speed will be the same because all alternatives need to be checked
anyway.

Given the German word frequency found there:
http://german.about.com/library/blwfreq_t50.htm

  word  #rank
  in  #4
  nach #25
  vor #36

Then faster would be to write

<token regexp="yes">in|nach|vor</token>

It does not cost anything to do, but should really be a minor optimisation.

I can see here...

http://blogs.msdn.com/b/bclteam/archive/2010/08/03/optimizing-regular-expression-performance-part-ii-taking-charge-of-backtracking.aspx

===BEGIN QUOTE===
Alternation constructs are evaluated sequentially from left to right. The
second alternative is evaluated only if the first alternative fails. If
there are three or more alternatives, the third alternative is evaluated
only if the first and second alternatives fail, and so on. Because of this,
the ordering of items in an alternation construct is significant.
Subpatterns that are more likely to be encountered in an input string
should precede subpatterns that are less likely to be encountered.
==END QUOTE===

-- Dominique
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to