Hi Shubham,
A far as I can tell, Mosesserver is beneficial when you want/need to
parallelize per-segment pre- and post-processing to a different machine
from the Mosesserver process. However, most (but not all) pre- and
post-processing impose minimal computational overhead. They typically
can process an entire batch of sentences (documents even multiple
documents) in one thread in a fraction of the time to run Moses.
So, I'm not sure what you intend to gain by running Mosesserver. It all
seems like a lot of unnecessary work. Why not encapsulate the standard
Moses executable in one deamonized top-level process? Include your pre-
and post- toolchains as sub-processes within the same daemon and use
standard pipes to/from Moses. Then use sockets (other) to pump docs
in/out of the daemon. You have one consistent interface that runs the
same regardless of whether your batch has one sentence or 300. Moses'
per-sentence multiprocessing kicks-in automatically.
Re "then it takes around 10 seconds to translate..." For a 10-token
sentence?! That extremely excessive. This link shows one of our
customer's experiments with his EN-RU Slate Desktop for Windows system
(you'll need to register/log-in to see it but registration is free).
Slate Desktop has a Moses kernel running native on Windows (i.e. no
CYGWIN). It uses a similar technique I described above. The customer's
experiments average 1.5 seconds per sentence for up to 40 tokens.
http://support.pttools.net/support/discussions/topics/6000042166
These times include pre-/post-processing and MT connector overhead
between memoQ and his engine. The SMT model uses the slower (now legacy)
binrarized phrase/reordering tables and binarized KENLM. The newer
compact tables would run faster, but there were complications making
them work on Windows. The CAT tool feeds sentence-by-sentence. So, Moses
is running effectively single threaded. Linux performance is comparable,
maybe a little (5%) faster. You're running 6-7 times slower! I expect
your performance bottleneck lies somewhere else.
Tom
On 12/29/2016 4:51 AM, [email protected] wrote:
Date: Thu, 29 Dec 2016 00:53:41 +0530
From: Shubham Khandelwal<[email protected]>
Subject: [Moses-support] Need help for parallelisation in mosesserver
To: moses-support<[email protected]>
Hello,
As mosesserver accepts only one sentence at a time. So I am creating one
another component in front of mosesserver to handle tokenisation, casing
and splitting taking care of parallelisation.
Following is my procedure to do it, let me know whether am I heading
correctly or not to do this:
*---*
*So suppose, if I have 5 different sentences (as a paragraph) to translate
at once (fr-en). So I will be creating mosesserver on 5 different ports
firstly and pass those 5 different sentences after doing parallely
tokenisaton, casing and splitting on those different ports and then
concatenate the output after recasing and detokenisation parallely. *
*--*
Let me know whether this is correct or not ? If no, then please suggest me
better solution to do this.
Also, I have one more question in this that if a sentence is composed of
around 10 words. Then when I pass this sentence to translate as follows:
-> ~/mosesdecoder/bin/mosesserver -f moses.ini -threads 16 -b 0.000000001
then it takes around 10 seconds to translate. To make it fast, I can run
this on different ports but that is not a good idea I think, as splitting a
single sentence to multiple group of sentence and then translate them on
different ports separately, can give different meaning rather than
translate the whole single sentence at single port.
So basically, my doubt is how to make better splitting in such cases which
can take care of parallelisation aswell ?
-- Yours Sincerely, Shubham Khandelwal
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support