Hi Silvan

so currently what we (approximately) do is
1) read file (line by line in general case) (main thread)
2) on a paragraph boundary we split it into sentences (main thread)
2a) only 1 thread is used up to this point
3) send the list of sentences to threaded executor for analysis
(tokenization/tagging/disambiguation), here # of callables to feed to
the thread pool = # of sentences
3a) now we wait for all threads to finish
4) collected analyzed sentences for the paragraph then sent to
threaded executor for rule check, here currently # of callables = # of
threads, if we increase # of callables up to certain point we get a
speedup, but beyond that point increasing granularity adds too much
overhead so at least I saw some slowdown: between 5-8% worse than at
the peak)
4a) we wait for all threads to finish
5) we collect all the rule matches and go to 1)

We loose time with idle threads in 1-2), 3a) and 4a). Ideally we
should have all stages work in parallel, thus e.g. while we run the
check some thread should already read next chunk of the file and fed
it to the analyzer, and while we're reading the file it would be ideal
to feed the sentence-by-sentence to the analysis. In reality it's not
that straightforward for several reasons:
1) current code logic:
 * we have some code to be parallelized that is not quite functional
and we have a lot of auxiliary code that takes care of output
formatting etc which needs to be refactored
2) analysis/check specifics:
 * we can't easily grab sentences continuously as we need to feed some
reasonable blocks of data to sentence tokenizer first (one may argue
if paragraph is a good chunking size)
 * some languages have rules that check on paragraph level
(inter-sentence checks) so they need to be fed the whole paragraph
3) texts fed into LT may have quite different characteristics and
depending on how paragraphs are slit, how many sentences in paragraph
there is, how long sentences are, how many and how complex our
logic/rules are, so if you can write "ideal" code you need to consider
which approach will benefit where

Most of those problems are solvable (to a different degree) but for
some the effort may not be trivial. If anybody is willing to dive
deeper into the subject I'll be grad to share my knowledge.

The only thing I can say is if we not rewriting for the new
multithreaded architecture (and this will take both -core and
-commandline) but rather take incremental improvements I would look
into running analysis/check in parallel to file reading/sentence
splitting - we wait for this block the amount of time comparable to
wait time in analysis and check blocks but in this case all but 1
threads are idle, while in other two cases only some threads are
waiting. Also I feel that due to most texts specifics we get a lot of
small paragraphs (default logic splits them by two newlines, which
triggers many small paragraphs for titles, notes, dialogs etc) which
means we start analysis/check blocks too often with small chunks of
data thus loosing in performance.

For now I thought that changing one number in 1 line that leads to
about 25% speedup is worth applying until we have better solution. :)


Andriy





2016-01-28 12:46 GMT-05:00 Silvan Jegen <m...@sillymon.ch>:
> Heyhey
>
> On Thu, Jan 28, 2016 at 08:57:49AM -0500, Andriy Rysin wrote:
>> yes, that's the case I tried with # of callables = # of rules (see my
>> previous email), the wait time went down quite a bit (as expected) but
>> the overall processing time went up, I suspect because of split/merge
>> overhead. But this depends heavily of type/number of rules, the text
>> and cpu (e.g. if rule processing time is more unbabalanced that in
>> Ukrainian case then increasing # of callables will help, otherwise the
>> effect could be reverse) so we would have to try other languages with
>
> I may have missed a statistic in your earlier mail but wouldn't splitting
> up the text in sentences and then sending batches of them to different
> threads result in the most evenly load distribution? Because of the
> result merging overhead this would only make sense when the number of
> lines crosses a certain threshhold.
>
>
> Cheers,
>
> Silvan
>
>> different number of callables to see what's the best approach.
>> I know we have regular wikipedia checks for some languages - that
>> could be a good benchmarking test.
>>
>> Regards,
>> Andriy
>>
>> 2016-01-28 8:47 GMT-05:00 Dominique Pellé <dominique.pe...@gmail.com>:
>> > Andriy Rysin wrote:
>> >
>> >
>> >> Then I realized that in the check method we split rules into callables
>> >> and their count is # of cores available (in my case 8), as I have 347
>> >> rules this means each bucket is 43 rules and rules being not equal in
>> >> complexity this could lead to quite unequal time for each thread.
>> >
>> > Hi Andriy
>> >
>> > Thanks for having a look at multi-thread performances.
>> > I don't know the code as much as you do. But if we indeed
>> > split the number of rules equally before processing them, then
>> > it seems bad for balancing the work.
>> >
>> > Can't we instead have a queue with all rules to be processed?
>> > When a thread is ready to do work, it picks the next rule to process
>> > from the queue. So load would be well balanced, even if some rules
>> > are 10x slower than others. With such a queue, a thread that picks
>> > up an expensive rules would end up processing less rules than a
>> > thread that picks up fast rules, keeping all CPUs busy, as much
>> > as possible.
>> >
>> > Regards
>> > Dominique
>> >
>> > ------------------------------------------------------------------------------
>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> > Monitor end-to-end web transactions and take corrective actions now
>> > Troubleshoot faster and improve end-user experience. Signup Now!
>> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> > _______________________________________________
>> > Languagetool-devel mailing list
>> > Languagetool-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to