Re: Multithreaded LT optimization (take 2)

Silvan Jegen Thu, 28 Jan 2016 11:18:17 -0800

Hi Andriy

Thanks for the writeup! It is a very good knowledge base for someone
who wants to work on improving LT's multithreaded performance.



On Thu, Jan 28, 2016 at 01:29:13PM -0500, Andriy Rysin wrote:
> so currently what we (approximately) do is
> 1) read file (line by line in general case) (main thread)
> 2) on a paragraph boundary we split it into sentences (main thread)
> 2a) only 1 thread is used up to this point
> 3) send the list of sentences to threaded executor for analysis
> (tokenization/tagging/disambiguation), here # of callables to feed to
> the thread pool = # of sentences
> 3a) now we wait for all threads to finish
> 4) collected analyzed sentences for the paragraph then sent to
> threaded executor for rule check, here currently # of callables = # of
> threads, if we increase # of callables up to certain point we get a
> speedup, but beyond that point increasing granularity adds too much
> overhead so at least I saw some slowdown: between 5-8% worse than at
> the peak)
> 4a) we wait for all threads to finish
> 5) we collect all the rule matches and go to 1)
> 
> We loose time with idle threads in 1-2), 3a) and 4a). Ideally we
> should have all stages work in parallel, thus e.g. while we run the
> check some thread should already read next chunk of the file and fed
> it to the analyzer, and while we're reading the file it would be ideal
> to feed the sentence-by-sentence to the analysis. In reality it's not
> that straightforward for several reasons:
> 1) current code logic:
>  * we have some code to be parallelized that is not quite functional
> and we have a lot of auxiliary code that takes care of output
> formatting etc which needs to be refactored
> 2) analysis/check specifics:
>  * we can't easily grab sentences continuously as we need to feed some
> reasonable blocks of data to sentence tokenizer first (one may argue
> if paragraph is a good chunking size)
>  * some languages have rules that check on paragraph level
> (inter-sentence checks) so they need to be fed the whole paragraph

I did not know that and that definitely makes things more complicated.


> 3) texts fed into LT may have quite different characteristics and
> depending on how paragraphs are slit, how many sentences in paragraph
> there is, how long sentences are, how many and how complex our
> logic/rules are, so if you can write "ideal" code you need to consider
> which approach will benefit where
> 
> Most of those problems are solvable (to a different degree) but for
> some the effort may not be trivial. If anybody is willing to dive
> deeper into the subject I'll be grad to share my knowledge.
> 
> The only thing I can say is if we not rewriting for the new
> multithreaded architecture (and this will take both -core and
> -commandline) but rather take incremental improvements I would look
> into running analysis/check in parallel to file reading/sentence
> splitting - we wait for this block the amount of time comparable to
> wait time in analysis and check blocks but in this case all but 1
> threads are idle, while in other two cases only some threads are
> waiting. Also I feel that due to most texts specifics we get a lot of
> small paragraphs (default logic splits them by two newlines, which
> triggers many small paragraphs for titles, notes, dialogs etc) which
> means we start analysis/check blocks too often with small chunks of
> data thus loosing in performance.
> 
> For now I thought that changing one number in 1 line that leads to
> about 25% speedup is worth applying until we have better solution. :)

Definitely. It sounds like solving some of the issues would involve a
substantial refactoring of certain parts of LT's processing pipeline.
Interesting work but potentially very time consuming.


Cheers,

Silvan


> Andriy
> 
> 
> 
> 
> 
> 2016-01-28 12:46 GMT-05:00 Silvan Jegen <m...@sillymon.ch>:
> > Heyhey
> >
> > On Thu, Jan 28, 2016 at 08:57:49AM -0500, Andriy Rysin wrote:
> >> yes, that's the case I tried with # of callables = # of rules (see my
> >> previous email), the wait time went down quite a bit (as expected) but
> >> the overall processing time went up, I suspect because of split/merge
> >> overhead. But this depends heavily of type/number of rules, the text
> >> and cpu (e.g. if rule processing time is more unbabalanced that in
> >> Ukrainian case then increasing # of callables will help, otherwise the
> >> effect could be reverse) so we would have to try other languages with
> >
> > I may have missed a statistic in your earlier mail but wouldn't splitting
> > up the text in sentences and then sending batches of them to different
> > threads result in the most evenly load distribution? Because of the
> > result merging overhead this would only make sense when the number of
> > lines crosses a certain threshhold.
> >
> >
> > Cheers,
> >
> > Silvan
> >
> >> different number of callables to see what's the best approach.
> >> I know we have regular wikipedia checks for some languages - that
> >> could be a good benchmarking test.
> >>
> >> Regards,
> >> Andriy
> >>
> >> 2016-01-28 8:47 GMT-05:00 Dominique Pellé <dominique.pe...@gmail.com>:
> >> > Andriy Rysin wrote:
> >> >
> >> >
> >> >> Then I realized that in the check method we split rules into callables
> >> >> and their count is # of cores available (in my case 8), as I have 347
> >> >> rules this means each bucket is 43 rules and rules being not equal in
> >> >> complexity this could lead to quite unequal time for each thread.
> >> >
> >> > Hi Andriy
> >> >
> >> > Thanks for having a look at multi-thread performances.
> >> > I don't know the code as much as you do. But if we indeed
> >> > split the number of rules equally before processing them, then
> >> > it seems bad for balancing the work.
> >> >
> >> > Can't we instead have a queue with all rules to be processed?
> >> > When a thread is ready to do work, it picks the next rule to process
> >> > from the queue. So load would be well balanced, even if some rules
> >> > are 10x slower than others. With such a queue, a thread that picks
> >> > up an expensive rules would end up processing less rules than a
> >> > thread that picks up fast rules, keeping all CPUs busy, as much
> >> > as possible.
> >> >
> >> > Regards
> >> > Dominique
> >> >
> >> > ------------------------------------------------------------------------------
> >> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> > Monitor end-to-end web transactions and take corrective actions now
> >> > Troubleshoot faster and improve end-user experience. Signup Now!
> >> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> >> > _______________________________________________
> >> > Languagetool-devel mailing list
> >> > Languagetool-devel@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >>
> >> ------------------------------------------------------------------------------
> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> >> Monitor end-to-end web transactions and take corrective actions now
> >> Troubleshoot faster and improve end-user experience. Signup Now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> >> _______________________________________________
> >> Languagetool-devel mailing list
> >> Languagetool-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >
> > ------------------------------------------------------------------------------
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> > _______________________________________________
> > Languagetool-devel mailing list
> > Languagetool-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: Multithreaded LT optimization (take 2)

Reply via email to