Thanks Ivan,

I'm running the experiments with my models, using the text-based
phrase-table that you used. Experiments are still running, they may take a
week to finish.

However, preliminary results suggest good scalability with Moses2.

https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-AyHyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
My models are here for you to download and test yourself if you like:
   http://statmt.org/~s0565741/download/for-ivan/

Below are my thoughts on possible reasons why there are discrepencies in
what we're seeing:
   1. You may have parameters in your moses.ini which are vastly different
from mine and suboptimal for speed and scability. We won't know until we
compare our 2 setups
   2. Yours and mine phrase-tables are vastly different sizes. Your
phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
phrase-table in our AMTA paper and also got good scalability, but have I
not tried one as small as yours. There may be phenomena that causes Moses2
to be bad with small models
   3. You loaded all models into memory, I loaded the phrase-table into
memory binary but had to use binary LM and reordering models. My models are
too large to load into RAM (they take up more RAM than the file size
suggest).
  4. You may be also running our of RAM by loading everything into memory,
causing disk swapping
  5. Your test set (1788 sentences) is too small. My test set is 800,000
sentences (5,847,726 tokens). The decoders rely on CPU caches (L1, L2 etc)
for speed. There are also setup costs for each decoding thread (eg.
creating memory pools in Moses2). If your experiment are over too quickly,
you may be measuring the decoder in the 'warm-up lap' rather than when it's
running at terminal velocity. Your quickest decoding experiments took 25
sec, my quickest took 200 sec.
  6. I think the way you exclude load time is unreliable. You exclude load
time by subtracting the average load time from the total time. However,
load time is multiple times more than decoding time so any variation in
load time will swamp the decoding time. I use the decoder's debugging
timing output.

If you can share your models, we might be able to find out the reason for
the difference in our results. I can provide you with ssh/scp access to my
server if you need to.



* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/


On 4 April 2017 at 09:00, Ivan Zapreev <[email protected]> wrote:

> Dear Hieu,
>
> Please see the answers below.
>
> Can you clarify a few things for me.
>>   1. How many sentences, words were in the test set you used to measure
>> decoding speed? Are there many duplicate sentence - ie. did you create a
>> large test set by concatenating the same small test set multiple times?
>>
>
> We run experiments on the same MT04 Chinese text as we tuned the system.
> The text consists of 1788 unique sentences and 49582 tokens.
>
>
>>   2. Are the model sizes you quoted the gzipped text files or unzipped,
>> or the model size as it is when loaded into memory?
>>
>
> This is the plain text models as stored on the hard drive.
>
>
>>   3. Can you please reformat this graph
>>         https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>> ture/blob/master/doc/images/experiments/servers/stats.time.tools.log.png
>>       as #threads v. words per second. Ie. don't use log, don't use
>> decoding time.
>>
>
> The plot is attached, but this one is not about words per second, it shows
> the decoding run-times (as in the link you sent). The non-log scale plot,
> as you will see, is hard to read. I also attach the plain data files for
> moses and moses2 with the column values as follows:
>
> number of threads | average runtime decoding + model loading | standard
> deviation | average runtime model loading | standard deviation
>
> --
> Best regards,
>
> Ivan
> <http://www.tainichok.ru/>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to