W dniu 2012-11-18 12:56, Daniel Naber pisze: > On 18.11.2012, 10:39:21 Marcin Miłkowski wrote: > >> The method isStringTokenMatched() is quite optimized but we might try >> something more. > > Thanks for those ideas. They sound valid and I guess one can show they work > in a micro-benchmark but I just tried them and the effect on overall > performance is so small that it cannot be measured reliably :-(
So it's better not to change anything there. My guess is that isStringTokenMatched() is simply called too many times even if it is pretty fast by itself. Now, it is called in several places, the most common of which is testAllReadings(), and you can see there that we're using naive search that is optimized but still not really nice for multiple readings. Namely, we test all token readings, and in the worst case, we go through all of them, also testing for possible exceptions. (If we have a match, we stop the checking). This might contribute to multiple calls of the same method. There's some space for optimizing testAllReadings() but I'm not sure if this will help greatly. Namely, we can check whether the string in question (token, pos or lemma) is the same as the one checked before, and if yes, then skip all checking. For some checks, we might get some benefits, as multiple readings share the same surface token, and it's the surface token that's usually included in our rules. The easiest way to see whether this could speed things up at all is change the line 162 in AbstractPatternRule.java to this one: final int numberOfReadings = 1; If there's a huge difference, then we might try to optimize by reusing the info about previous checks. Note that we can add some fields to token and have booleans tokenAsBefore, etc. This would be computed once, and used multiple times. I think there will be some difference because I think I had some speedup when I discarded more readings during disambiguation in Polish (this is why more disambiguation rules almost never make anything slower). Regards Marcin ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel