W dniu 2012-11-16 10:10, R.J. Baars pisze:
>>> Hi,
>>>
>>> the load test I did for our HTTPS service showed that we have kind of a
>>> performance problem for languages that have a lot of rules. Testing a
>>> random German text with 125 sentences takes 2 seconds on my machine at
>>> average. About half of that time is spent in rule matching, i.e. in
>>> PatternRuleMatcher.match().
>>>
>>> What can be done about this?
>>>
>>> -More micro-optimization of the inner loop of pattern matching. I think
>>> there's not much potential in that, but I'd love to be proven wrong.
>>>
>>> -Make the checking process work in parallel to better use multiple
>>> cores.
>>>
>>> -Rewrite the pattern matching to use a finite state machine. I think
>>> this
>>> could improve performance a lot, if we create one state machine that
>>> includes all rules of a language. This state machine would then work on
>>> a
>>> per-sentence basis.
>>>
>> Instead af checking a text completely, check gradually; active sentence
>> first, then back and forth (client code)
>>
>> Could rules be 'compiled' instead of interpreted?
>>
>> Is there a significant difference between direct word matching, postag
>> matching and regexp matching? Would it help changing some postag-rules
>> into word-match rules for the most common words?
>>
>> Could rules be ordered by biggest chance to hit first?
>>
>> Could rules be structured in if-then-else-like structures? (Mutually
>> excluding trules e.g.)
>>
>> Is there time to be won in searching the postags for words?
>>
> Is part of the time communication time? (client-server-client?) Anyt
> option to speed that up (compression/dropping spaces?)

This is true for HTTP server: the shorter the string, the lower the 
speed per whole document. This is because we initialize the HTTP server 
on every call, which makes it really hard for local purposes, and slow 
as well. Some code does not expect this, and for this reason, for 
example, CheckMate is painfully slow.

We should have some caching, IMHO, at least as an option.

Regards,
Marcin

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to