W dniu 2012-11-18 12:56, Daniel Naber pisze:
> On 18.11.2012, 10:39:21 Marcin Miłkowski wrote:
>
>> The method isStringTokenMatched() is quite optimized but we might try
>> something more.
>
> Thanks for those ideas. They sound valid and I guess one can show they work
> in a micro-benchmark but I just tried them and the effect on overall
> performance is so small that it cannot be measured reliably :-(

So it's better not to change anything there.

My guess is that isStringTokenMatched() is simply called too many times 
even if it is pretty fast by itself. Now, it is called in several 
places, the most common of which is testAllReadings(), and you can see 
there that we're using naive search that is optimized but still not 
really nice for multiple readings. Namely, we test all token readings, 
and in the worst case, we go through all of them, also testing for 
possible exceptions. (If we have a match, we stop the checking). This 
might contribute to multiple calls of the same method.

There's some space for optimizing testAllReadings() but I'm not sure if 
this will help greatly. Namely, we can check whether the string in 
question (token, pos or lemma) is the same as the one checked before, 
and if yes, then skip all checking. For some checks, we might get some 
benefits, as multiple readings share the same surface token, and it's 
the surface token that's usually included in our rules.

The easiest way to see whether this could speed things up at all is 
change the line 162 in AbstractPatternRule.java to this one:

final int numberOfReadings = 1;

If there's a huge difference, then we might try to optimize by reusing 
the info about previous checks. Note that we can add some fields to 
token and have booleans tokenAsBefore, etc. This would be computed once, 
and used multiple times.

I think there will be some difference because I think I had some speedup 
when I discarded more readings during disambiguation in Polish (this is 
why more disambiguation rules almost never make anything slower).

Regards
Marcin



------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to