Re: [Moses-support] Moses vs Moses2 in its multi-threading

Hieu Hoang Mon, 10 Apr 2017 13:11:08 -0700

It's true that for this experiment (non-cube-pruning, loading the
phrase-table into memory) that Moses scales as well as Moses2, and that the
Moses2 is only 3 times faster, rather than 10 times.


However, the results on the website and paper is correct when using
cube-pruning and binary phrase-table (compact pt for Moses, probing pt for
Moses2). I've never tested your setup until now, and you've never tested
mine so we're talking at cross purposes.

I will add my recent results to the website to make it more complete


* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/


On 10 April 2017 at 10:07, Ivan Zapreev <[email protected]> wrote:

> Dear Hieu,
>
> Thank you very much for your e-mail and all the effort you put into
> re-running the experiments!
>
> I think however that an exel of values gives a rather obscured view on the
> results. I would rather prefer to see the plots of pure decoding times or
> wps or speed increment per number of cores.
>
> Any how, I see that my main concern about the Moses vs Moses2 scalability
> is actulally. The speed-up of Moses 2 is not so much related to better
> multi-threading but is more of a single thread decoding speed improvement
> and the original results listed on the website are flawed.
>
> Regarding the reasons you gave me in the previous e-mail. Not that I find
> it important now that I see that you have the same results, but I already
> pointed out that the cold/hot data issue is not possible, so we can rule it
> out. Some other "parameter miss match" did sound strange to me as all the
> parameters I used are listed in the experimental set-up section:
> https://github.com/ivan-zapreev/Basic-Translation-
> Infrastructure#test-set-up-1 So there is nothing else that could be
> different except for the models and the texts themselves.
>
> Kind regards,
>
> Dr. Ivan S. Zapreev
>
> On Sun, Apr 9, 2017 at 8:42 PM, Hieu Hoang <[email protected]> wrote:
>
>> Hi Ivan
>>
>> I've finished running my experiments with the vanilla phrase-based
>> algorithm and memory phrase-table. The results are here:
>>    https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>> Summary:
>>   1. Moses2 is about 3 times faster than Moses.
>>   2. Both decoder is 15-16 times faster running with 32 threads than on 1
>> thread (on a 16 cores/32 hyperthread server).
>>   3. Moses2 with binary phrase-table is slightly faster than loading the
>> pt into memory.
>>
>> I'm happy with the speed with of Moses2, and the scalability wrt number
>> of cores. The scalability is in line with that reported on the website on
>> and in the paper.
>>
>> The original Moses decoder also seem to have similar scalability,
>> contrary to my previous results. I have some explanation for it but I'm not
>> too concerned, it's great that Moses is also good!
>>
>> This doesn't correlate with some of your findings, I've outlined some
>> possible reasons in the last email.
>>
>>
>> * Looking for MT/NLP opportunities *
>> Hieu Hoang
>> http://moses-smt.org/
>>
>>
>> On 4 April 2017 at 11:01, Ivan Zapreev <[email protected]> wrote:
>>
>>> Dear Hieu,
>>>
>>> Thank you for the feedback and the info. I am not sure what you mean by
>>> "good scalability", I can not really visualize the plots from numbers in my
>>> head. Sorry.
>>>
>>> Using bigger models is indeed always good but I used the biggest there
>>> were available.
>>>
>>> I did make sure there was no swapping, I already mentioned it.
>>>
>>> I did take the average run times for loading and decoding and just
>>> loading with standard deviations.
>>> The latter show that the way things are measured were reliable.
>>>
>>> The L1 and L2 cache issues do not sound convincing to me. The caches are
>>> just up to some Mbs and the models you work with are gigabytes. There will
>>> always be cache misses in this setting. The only issue I can think of is if
>>> the data is not fully pre-loaded into RAM then you have a cold-run but not
>>> more than that.
>>>
>>> I think if you finish the runs and then could plot the results, then we
>>> could see the clearer picture...
>>>
>>> Thanks again!
>>>
>>> Kind regards,
>>>
>>> Ivan
>>>
>>>
>>> On Tue, Apr 4, 2017 at 11:40 AM, Hieu Hoang <[email protected]> wrote:
>>>
>>>> Thanks Ivan,
>>>>
>>>> I'm running the experiments with my models, using the text-based
>>>> phrase-table that you used. Experiments are still running, they may take a
>>>> week to finish.
>>>>
>>>> However, preliminary results suggest good scalability with Moses2.
>>>>    https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>>>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>>>> My models are here for you to download and test yourself if you like:
>>>>    http://statmt.org/~s0565741/download/for-ivan/
>>>>
>>>> Below are my thoughts on possible reasons why there are discrepencies
>>>> in what we're seeing:
>>>>    1. You may have parameters in your moses.ini which are vastly
>>>> different from mine and suboptimal for speed and scability. We won't know
>>>> until we compare our 2 setups
>>>>    2. Yours and mine phrase-tables are vastly different sizes. Your
>>>> phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
>>>> phrase-table in our AMTA paper and also got good scalability, but have I
>>>> not tried one as small as yours. There may be phenomena that causes Moses2
>>>> to be bad with small models
>>>>    3. You loaded all models into memory, I loaded the phrase-table into
>>>> memory binary but had to use binary LM and reordering models. My models are
>>>> too large to load into RAM (they take up more RAM than the file size
>>>> suggest).
>>>>   4. You may be also running our of RAM by loading everything into
>>>> memory, causing disk swapping
>>>>   5. Your test set (1788 sentences) is too small. My test set is
>>>> 800,000 sentences (5,847,726 tokens). The decoders rely on CPU caches (L1,
>>>> L2 etc) for speed. There are also setup costs for each decoding thread (eg.
>>>> creating memory pools in Moses2). If your experiment are over too quickly,
>>>> you may be measuring the decoder in the 'warm-up lap' rather than when it's
>>>> running at terminal velocity. Your quickest decoding experiments took 25
>>>> sec, my quickest took 200 sec.
>>>>   6. I think the way you exclude load time is unreliable. You exclude
>>>> load time by subtracting the average load time from the total time.
>>>> However, load time is multiple times more than decoding time so any
>>>> variation in load time will swamp the decoding time. I use the decoder's
>>>> debugging timing output.
>>>>
>>>> If you can share your models, we might be able to find out the reason
>>>> for the difference in our results. I can provide you with ssh/scp access to
>>>> my server if you need to.
>>>>
>>>>
>>>>
>>>> * Looking for MT/NLP opportunities *
>>>> Hieu Hoang
>>>> http://moses-smt.org/
>>>>
>>>>
>>>> On 4 April 2017 at 09:00, Ivan Zapreev <[email protected]> wrote:
>>>>
>>>>> Dear Hieu,
>>>>>
>>>>> Please see the answers below.
>>>>>
>>>>> Can you clarify a few things for me.
>>>>>>   1. How many sentences, words were in the test set you used to
>>>>>> measure decoding speed? Are there many duplicate sentence - ie. did you
>>>>>> create a large test set by concatenating the same small test set multiple
>>>>>> times?
>>>>>>
>>>>>
>>>>> We run experiments on the same MT04 Chinese text as we tuned the
>>>>> system. The text consists of 1788 unique sentences and 49582 tokens.
>>>>>
>>>>>
>>>>>>   2. Are the model sizes you quoted the gzipped text files or
>>>>>> unzipped, or the model size as it is when loaded into memory?
>>>>>>
>>>>>
>>>>> This is the plain text models as stored on the hard drive.
>>>>>
>>>>>
>>>>>>   3. Can you please reformat this graph
>>>>>>         https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>>>>>> ture/blob/master/doc/images/experiments/servers/stats.time.t
>>>>>> ools.log.png
>>>>>>       as #threads v. words per second. Ie. don't use log, don't use
>>>>>> decoding time.
>>>>>>
>>>>>
>>>>> The plot is attached, but this one is not about words per second, it
>>>>> shows the decoding run-times (as in the link you sent). The non-log scale
>>>>> plot, as you will see, is hard to read. I also attach the plain data files
>>>>> for moses and moses2 with the column values as follows:
>>>>>
>>>>> number of threads | average runtime decoding + model loading |
>>>>> standard deviation | average runtime model loading | standard deviation
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Ivan
>>>>> <http://www.tainichok.ru/>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Ivan
>>> <http://www.tainichok.ru/>
>>>
>>
>>
>
>
> --
> Best regards,
>
> Ivan
> <http://www.tainichok.ru/>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses vs Moses2 in its multi-threading

Reply via email to