Hi Andriy Thanks for the writeup! It is a very good knowledge base for someone who wants to work on improving LT's multithreaded performance.
On Thu, Jan 28, 2016 at 01:29:13PM -0500, Andriy Rysin wrote: > so currently what we (approximately) do is > 1) read file (line by line in general case) (main thread) > 2) on a paragraph boundary we split it into sentences (main thread) > 2a) only 1 thread is used up to this point > 3) send the list of sentences to threaded executor for analysis > (tokenization/tagging/disambiguation), here # of callables to feed to > the thread pool = # of sentences > 3a) now we wait for all threads to finish > 4) collected analyzed sentences for the paragraph then sent to > threaded executor for rule check, here currently # of callables = # of > threads, if we increase # of callables up to certain point we get a > speedup, but beyond that point increasing granularity adds too much > overhead so at least I saw some slowdown: between 5-8% worse than at > the peak) > 4a) we wait for all threads to finish > 5) we collect all the rule matches and go to 1) > > We loose time with idle threads in 1-2), 3a) and 4a). Ideally we > should have all stages work in parallel, thus e.g. while we run the > check some thread should already read next chunk of the file and fed > it to the analyzer, and while we're reading the file it would be ideal > to feed the sentence-by-sentence to the analysis. In reality it's not > that straightforward for several reasons: > 1) current code logic: > * we have some code to be parallelized that is not quite functional > and we have a lot of auxiliary code that takes care of output > formatting etc which needs to be refactored > 2) analysis/check specifics: > * we can't easily grab sentences continuously as we need to feed some > reasonable blocks of data to sentence tokenizer first (one may argue > if paragraph is a good chunking size) > * some languages have rules that check on paragraph level > (inter-sentence checks) so they need to be fed the whole paragraph I did not know that and that definitely makes things more complicated. > 3) texts fed into LT may have quite different characteristics and > depending on how paragraphs are slit, how many sentences in paragraph > there is, how long sentences are, how many and how complex our > logic/rules are, so if you can write "ideal" code you need to consider > which approach will benefit where > > Most of those problems are solvable (to a different degree) but for > some the effort may not be trivial. If anybody is willing to dive > deeper into the subject I'll be grad to share my knowledge. > > The only thing I can say is if we not rewriting for the new > multithreaded architecture (and this will take both -core and > -commandline) but rather take incremental improvements I would look > into running analysis/check in parallel to file reading/sentence > splitting - we wait for this block the amount of time comparable to > wait time in analysis and check blocks but in this case all but 1 > threads are idle, while in other two cases only some threads are > waiting. Also I feel that due to most texts specifics we get a lot of > small paragraphs (default logic splits them by two newlines, which > triggers many small paragraphs for titles, notes, dialogs etc) which > means we start analysis/check blocks too often with small chunks of > data thus loosing in performance. > > For now I thought that changing one number in 1 line that leads to > about 25% speedup is worth applying until we have better solution. :) Definitely. It sounds like solving some of the issues would involve a substantial refactoring of certain parts of LT's processing pipeline. Interesting work but potentially very time consuming. Cheers, Silvan > Andriy > > > > > > 2016-01-28 12:46 GMT-05:00 Silvan Jegen <m...@sillymon.ch>: > > Heyhey > > > > On Thu, Jan 28, 2016 at 08:57:49AM -0500, Andriy Rysin wrote: > >> yes, that's the case I tried with # of callables = # of rules (see my > >> previous email), the wait time went down quite a bit (as expected) but > >> the overall processing time went up, I suspect because of split/merge > >> overhead. But this depends heavily of type/number of rules, the text > >> and cpu (e.g. if rule processing time is more unbabalanced that in > >> Ukrainian case then increasing # of callables will help, otherwise the > >> effect could be reverse) so we would have to try other languages with > > > > I may have missed a statistic in your earlier mail but wouldn't splitting > > up the text in sentences and then sending batches of them to different > > threads result in the most evenly load distribution? Because of the > > result merging overhead this would only make sense when the number of > > lines crosses a certain threshhold. > > > > > > Cheers, > > > > Silvan > > > >> different number of callables to see what's the best approach. > >> I know we have regular wikipedia checks for some languages - that > >> could be a good benchmarking test. > >> > >> Regards, > >> Andriy > >> > >> 2016-01-28 8:47 GMT-05:00 Dominique Pellé <dominique.pe...@gmail.com>: > >> > Andriy Rysin wrote: > >> > > >> > > >> >> Then I realized that in the check method we split rules into callables > >> >> and their count is # of cores available (in my case 8), as I have 347 > >> >> rules this means each bucket is 43 rules and rules being not equal in > >> >> complexity this could lead to quite unequal time for each thread. > >> > > >> > Hi Andriy > >> > > >> > Thanks for having a look at multi-thread performances. > >> > I don't know the code as much as you do. But if we indeed > >> > split the number of rules equally before processing them, then > >> > it seems bad for balancing the work. > >> > > >> > Can't we instead have a queue with all rules to be processed? > >> > When a thread is ready to do work, it picks the next rule to process > >> > from the queue. So load would be well balanced, even if some rules > >> > are 10x slower than others. With such a queue, a thread that picks > >> > up an expensive rules would end up processing less rules than a > >> > thread that picks up fast rules, keeping all CPUs busy, as much > >> > as possible. > >> > > >> > Regards > >> > Dominique > >> > > >> > ------------------------------------------------------------------------------ > >> > Site24x7 APM Insight: Get Deep Visibility into Application Performance > >> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > >> > Monitor end-to-end web transactions and take corrective actions now > >> > Troubleshoot faster and improve end-user experience. Signup Now! > >> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > >> > _______________________________________________ > >> > Languagetool-devel mailing list > >> > Languagetool-devel@lists.sourceforge.net > >> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > >> > >> ------------------------------------------------------------------------------ > >> Site24x7 APM Insight: Get Deep Visibility into Application Performance > >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > >> Monitor end-to-end web transactions and take corrective actions now > >> Troubleshoot faster and improve end-user experience. Signup Now! > >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > >> _______________________________________________ > >> Languagetool-devel mailing list > >> Languagetool-devel@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > > ------------------------------------------------------------------------------ > > Site24x7 APM Insight: Get Deep Visibility into Application Performance > > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > > Monitor end-to-end web transactions and take corrective actions now > > Troubleshoot faster and improve end-user experience. Signup Now! > > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > > _______________________________________________ > > Languagetool-devel mailing list > > Languagetool-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel