Re: [Moses-support] Moses vs Moses2 in its multi-threading

Ivan Zapreev Tue, 04 Apr 2017 03:05:06 -0700

Dear Hieu,

Thank you for the feedback and the info. I am not sure what you mean by
"good scalability", I can not really visualize the plots from numbers in my
head. Sorry.


Using bigger models is indeed always good but I used the biggest there were
available.

I did make sure there was no swapping, I already mentioned it.

I did take the average run times for loading and decoding and just loading
with standard deviations.
The latter show that the way things are measured were reliable.

The L1 and L2 cache issues do not sound convincing to me. The caches are
just up to some Mbs and the models you work with are gigabytes. There will
always be cache misses in this setting. The only issue I can think of is if
the data is not fully pre-loaded into RAM then you have a cold-run but not
more than that.

I think if you finish the runs and then could plot the results, then we
could see the clearer picture...

Thanks again!

Kind regards,

Ivan


On Tue, Apr 4, 2017 at 11:40 AM, Hieu Hoang <[email protected]> wrote:

> Thanks Ivan,
>
> I'm running the experiments with my models, using the text-based
> phrase-table that you used. Experiments are still running, they may take a
> week to finish.
>
> However, preliminary results suggest good scalability with Moses2.
>    https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-
> AyHyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
> My models are here for you to download and test yourself if you like:
>    http://statmt.org/~s0565741/download/for-ivan/
>
> Below are my thoughts on possible reasons why there are discrepencies in
> what we're seeing:
>    1. You may have parameters in your moses.ini which are vastly different
> from mine and suboptimal for speed and scability. We won't know until we
> compare our 2 setups
>    2. Yours and mine phrase-tables are vastly different sizes. Your
> phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
> phrase-table in our AMTA paper and also got good scalability, but have I
> not tried one as small as yours. There may be phenomena that causes Moses2
> to be bad with small models
>    3. You loaded all models into memory, I loaded the phrase-table into
> memory binary but had to use binary LM and reordering models. My models are
> too large to load into RAM (they take up more RAM than the file size
> suggest).
>   4. You may be also running our of RAM by loading everything into memory,
> causing disk swapping
>   5. Your test set (1788 sentences) is too small. My test set is 800,000
> sentences (5,847,726 tokens). The decoders rely on CPU caches (L1, L2 etc)
> for speed. There are also setup costs for each decoding thread (eg.
> creating memory pools in Moses2). If your experiment are over too quickly,
> you may be measuring the decoder in the 'warm-up lap' rather than when it's
> running at terminal velocity. Your quickest decoding experiments took 25
> sec, my quickest took 200 sec.
>   6. I think the way you exclude load time is unreliable. You exclude load
> time by subtracting the average load time from the total time. However,
> load time is multiple times more than decoding time so any variation in
> load time will swamp the decoding time. I use the decoder's debugging
> timing output.
>
> If you can share your models, we might be able to find out the reason for
> the difference in our results. I can provide you with ssh/scp access to my
> server if you need to.
>
>
>
> * Looking for MT/NLP opportunities *
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 4 April 2017 at 09:00, Ivan Zapreev <[email protected]> wrote:
>
>> Dear Hieu,
>>
>> Please see the answers below.
>>
>> Can you clarify a few things for me.
>>>   1. How many sentences, words were in the test set you used to measure
>>> decoding speed? Are there many duplicate sentence - ie. did you create a
>>> large test set by concatenating the same small test set multiple times?
>>>
>>
>> We run experiments on the same MT04 Chinese text as we tuned the system.
>> The text consists of 1788 unique sentences and 49582 tokens.
>>
>>
>>>   2. Are the model sizes you quoted the gzipped text files or unzipped,
>>> or the model size as it is when loaded into memory?
>>>
>>
>> This is the plain text models as stored on the hard drive.
>>
>>
>>>   3. Can you please reformat this graph
>>>         https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>>> ture/blob/master/doc/images/experiments/servers/stats.time.tools.log.png
>>>       as #threads v. words per second. Ie. don't use log, don't use
>>> decoding time.
>>>
>>
>> The plot is attached, but this one is not about words per second, it
>> shows the decoding run-times (as in the link you sent). The non-log scale
>> plot, as you will see, is hard to read. I also attach the plain data files
>> for moses and moses2 with the column values as follows:
>>
>> number of threads | average runtime decoding + model loading | standard
>> deviation | average runtime model loading | standard deviation
>>
>> --
>> Best regards,
>>
>> Ivan
>> <http://www.tainichok.ru/>
>>
>
>


-- 
Best regards,

Ivan
<http://www.tainichok.ru/>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses vs Moses2 in its multi-threading

Reply via email to