So the main problem with this performance improvement is that we read across paragraphs. There are two problems with this: 1) error context shows sentences from another paragraph: I almost worked out a solution for that by adjusting ContextTools but then I found the next one: 2) the cross-sentence rules start to work across paragraphs
and when I was analyzing the code I found that if we read from the file and it's smaller than 64k we don't parse it by paragraphs. So the cross-sentence rules work across paragraph here too. This can be observed in MainTest.testEnglishFile() which gives 3 matches vs MainTest.testEnglishStdIn4() which reads the same text but using stdin gives 4. If we are to fix the "small file" case by splitting paragraph would it make sense to remove special hanlding for small files? If it's small it would be checked fast anyway and removing extra if/else blocks would clean up the code logic... Thanks Andriy 2015-02-20 9:00 GMT-05:00 Andriy Rysin <ary...@gmail.com>: > So before wrapping these optimizations up I decided to take a last > look at the thread graph in jvisualvm and it showed that the worker > threads spend more time in park state then in running. But the graph > was really not showing why, it was more like a noodle soup. So I > brought one of my past optimization back in: to always read file in > big blocks (don't start analyze/check on each paragraph break), this > made the thread graph very clear: besides waiting for main thread to > prepare sentences the check threads run times were not equal (we had > equal amount of rules per thread which does not actually amount to the > same load). So I've added another of my test optimizations which > didn't help before: creating a callable for each rule rather than for > group of rules. > The result: my cpu idle state went from 40% to 10% (now pretty much > all of those 10% is in main thread, we could optimize it too but will > have to refactor our workflow a bit). The speed went up from ~2500 > (~1900 originally before previous optimizations) to ~2700 sentences/s. > With this change adding more threads than cpus don't help (actually > decreases performance) so we could probably get rid of the new > internal property. > > Just to note: there's slight change in output: as we don't split the > check on each paragraph change in the output some sentences with > errors will have the beginning of the next sentence (beyond paragraph > break). Hopefully it's not a big deal. > > I will need to work on cleaning thigs up, add changes for > SameRuleGroupFilter and then will create another branch for everybody > to test it out. > > Andriy > > 2015-02-20 8:10 GMT-05:00 Daniel Naber <daniel.na...@languagetool.org>: >> On 2015-02-19 22:16, Andriy Rysin wrote: >> >>> I've merged multithreading branch into master. Please try it out when >>> you have a chance and let me know if you see any issues. >> >> Thanks. Some small cleanup ideas: >> >> -setThreadPoolSize should probably be a parameter of the constructor, as >> calling it after thread pool setup would fail anyway ("Thread pool >> already initialized") >> -Does newFixedThreadPool need to use lazy init? If it gets initialized >> in the constructor, it can also be made final. >> -It can be 'threadPool' I think, no need for the 'new' and 'fixed' in >> the variable name. >> >> Regards >> Daniel >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel