Ok, I worked on this a bit more and and didn't get anything as good as in the first run:
as main thread reading the file and tokenizing sentence is always single-threaded I tested some improvements there 1) in commandline.Main we do call handleLine (and all the heavy processing using threads) on double new line (considering it a paragraph break), but (at least in my text files) very often there are multiple new lines and sometimes multiple lines with whitespace only; so on new paragraph I tried to shortcut several cases: a) if there's new paragraph but text read from file is shorter than 100 characters keep reading b) if there's new paragraph but text is only whitespaces skip processing (drop text) c) try to increase the buffer size from 64k to 640k 2) when we start threads to analyze and check sentences don't block the main thread to wait for return, but go and read next lines in parallel, when analyze/check threads are done we aready have new tokenized sentences for them to start munching again. Suprisingly none of those gave any singnificant performance improvements, the only explanation I have is that main thread really does not take much cpu any more so even though it's a bottleneck it is a short one :) and improvements there don't help much. Another thing I tried is to increase # of threads, currently we use # of cores (which gives 4 on my i3) and I forced it to be 5,6,7,8. I noticed some small improvements between 4 and 7 threads (2700 sentences/s vs 2500 = ~10%) but I cannot reproduce it consistently (although it consistenty drives my cpu being less ide ~ 40% instead of ~50%). I don't have much explanation for this so I introduced a system property (org.languagetool.thread_count) if you want to force different # of threads. It still bugs me that we don't use almost half of cpu available :) My guess is that the analyze/check threads contend for some resource but I could not figure out quickly if that's true and if yes where it is. The top two packages the threads spend time in is java.util.regex.matcher and morfologik.fsa.CFSA2, first should be paralellizable, not sure about the second one. Unfortunately I would not be able to spend much more time analyzing performance for now so if we're ok with current changes in multithreading branch we should merge them into master. Regards, Andriy 2015-02-16 6:20 GMT-05:00 R.J. Baars <r.j.ba...@xs4all.nl>: > Great performance achievement! > >> I've pushed a new branch "multithreading" into git. There are 3 >> changes right now: >> 1) Don't recreate thread pool >> 2) Analyze sentences in threads >> 3) Optimize some code on main thread (as all coordination goes through >> a main thread it is a bottleneck and any improvement there helps a >> lot) >> >> On my profiling case (big text file) the performance improvement is ~40%: >> < Time: 321253ms for 610758 sentences (1901.2 sentences/sec) >> --- >>> Time: 241689ms for 610758 sentences (2527.0 sentences/sec) >> >> The main thread now only takes ~6%, pretty much almost all of it in >> sentence tokenizer (according to jmc). >> My 4 cores on i3 now idle ~50% down from ~60%. Much better though not >> perfect. >> >> It looks like if we want to make it truly paralellized there are two >> things to do: >> 1) tokenize sentences in parallel - this probably is not trivial >> 2) streamline the whole process: right now we're reading the file, >> sentence tokenizing happens in main thread, then we analyze sentences >> in threads (split by sentences), then main thread collects the >> results, then we check rules in threads (split by rules), then main >> collects the results - we would need to remove these "checkpoints" on >> main thread to make it truly paralellized (may be when we switch to >> Java 8 we could use its streams to simplify this) >> >> I didn't add any unit tests as there were no new API or functionality >> and existing unit tests pass and my big test on 7 million word text >> does not produce any regression. >> >> Please take a look and let me know if this works for you and if >> there's anything else we need to do to merge this into master. >> >> Andriy >> >> 2015-02-12 22:35 GMT-05:00 Andriy Rysin <ary...@gmail.com>: >>> So I've played with this a bit today and here's what I found: >>> with 3 relatively small changes: >>> 1) reuse thread pool rather that recreate it every time (this probably >>> least important from performance point of view but it's easier to >>> profile 4 worker threads than hundreds) >>> 2) run sentence analyzer in parallel (using same pool as for rules) >>> 3) start one callable for each rule check instead of group of rules - >>> this way we spread the check more evenly (should help if e.g. rules in >>> one group take much longer than the other) >>> >>> I was able to get the cpu utilization in multi-threaded LT from 40% to >>> 60% and the time to check text of 7 million words went went down by >>> 20% (1900 sentences/sec before, 2378 after). >>> >>> There's still seem be to room for improvement, I can see two things: >>> 1) run sentence tokenizer in parallel as well (ideally everything >>> after reading paragraph should be parallel) - LT still spends ~13% >>> time in main thread which means it can make other threads starve >>> 2) stream the line end-to-end unit of work into one callable so we >>> don't have to wait between tokenizing, analyzing, and checking; the >>> problem here seems to be that we can only split sentence analyzing by >>> sentences but for rules we split by rules. We need to check if >>> splitting rule check by sentences performs worse than when split by >>> rules (but if it performs good we can get rid of rule group filter >>> problem). >>> >>> There was one interesting side effect though: when I split each rule >>> into callable there was a regression for rule group filter. It seems >>> if we split the rules the group filter does not work. But >>> theoretically current code can fail here too: if two rules are in the >>> same rule group and have same result but we split those two rules in >>> separate threads the rule group filter will not work. I must say I >>> didn't run any tests to proove this theory and the changes of this >>> happening are low (at least with big number of rules and low cpu >>> count) it should still be possible. >>> >>> I'll do a bit more research when I have time. >>> >>> Andriy >>> >>> 2015-02-11 7:39 GMT-05:00 Daniel Naber <daniel.na...@languagetool.org>: >>>> On 2015-02-11 05:07, Andriy Rysin wrote: >>>> >>>>> 1) it seems like we're currently creating and destorying thread pool >>>>> every time we check sentences, would it not make more sense to create >>>>> pool once and keep threads in the pool and reuse them? >>>> >>>> I think so. The number of threads should then probably be specified via >>>> constructor, not via a set method, so it cannot be changed. >>>> >>>>> So I am wondering if somebody can shed a bit more light on what are >>>>> the critical parts that are not thread-safe and is it worth exploring >>>>> if they can be paralellized or it's too much work for not much gain? >>>> >>>> I don't think there's a better way than carefully looking at the code >>>> to >>>> make sure it's thread-safe *and* have extensive tests that run with >>>> several threads (not as active unit tests, as they run too long). But I >>>> think it makes sense to make more code run in parallel. >>>> >>>> Regards >>>> Daniel >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Dive into the World of Parallel Programming. The Go Parallel Website, >>>> sponsored by Intel and developed in partnership with Slashdot Media, is >>>> your >>>> hub for all things parallel software development, from weekly thought >>>> leadership blogs to news, videos, case studies, tutorials and more. >>>> Take a >>>> look and join the conversation now. http://goparallel.sourceforge.net/ >>>> _______________________________________________ >>>> Languagetool-devel mailing list >>>> Languagetool-devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel