Dear Hieu,

Thank you very much for your e-mail and all the effort you put into
re-running the experiments!

I think however that an exel of values gives a rather obscured view on the
results. I would rather prefer to see the plots of pure decoding times or
wps or speed increment per number of cores.

Any how, I see that my main concern about the Moses vs Moses2 scalability
is actulally. The speed-up of Moses 2 is not so much related to better
multi-threading but is more of a single thread decoding speed improvement
and the original results listed on the website are flawed.

Regarding the reasons you gave me in the previous e-mail. Not that I find
it important now that I see that you have the same results, but I already
pointed out that the cold/hot data issue is not possible, so we can rule it
out. Some other "parameter miss match" did sound strange to me as all the
parameters I used are listed in the experimental set-up section:
https://github.com/ivan-zapreev/Basic-Translation-Infrastructure#test-set-up-1
So there is nothing else that could be different except for the models and
the texts themselves.

Kind regards,

Dr. Ivan S. Zapreev

On Sun, Apr 9, 2017 at 8:42 PM, Hieu Hoang <[email protected]> wrote:

> Hi Ivan
>
> I've finished running my experiments with the vanilla phrase-based
> algorithm and memory phrase-table. The results are here:
>    https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-
> AyHyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
> Summary:
>   1. Moses2 is about 3 times faster than Moses.
>   2. Both decoder is 15-16 times faster running with 32 threads than on 1
> thread (on a 16 cores/32 hyperthread server).
>   3. Moses2 with binary phrase-table is slightly faster than loading the
> pt into memory.
>
> I'm happy with the speed with of Moses2, and the scalability wrt number of
> cores. The scalability is in line with that reported on the website on and
> in the paper.
>
> The original Moses decoder also seem to have similar scalability, contrary
> to my previous results. I have some explanation for it but I'm not too
> concerned, it's great that Moses is also good!
>
> This doesn't correlate with some of your findings, I've outlined some
> possible reasons in the last email.
>
>
> * Looking for MT/NLP opportunities *
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 4 April 2017 at 11:01, Ivan Zapreev <[email protected]> wrote:
>
>> Dear Hieu,
>>
>> Thank you for the feedback and the info. I am not sure what you mean by
>> "good scalability", I can not really visualize the plots from numbers in my
>> head. Sorry.
>>
>> Using bigger models is indeed always good but I used the biggest there
>> were available.
>>
>> I did make sure there was no swapping, I already mentioned it.
>>
>> I did take the average run times for loading and decoding and just
>> loading with standard deviations.
>> The latter show that the way things are measured were reliable.
>>
>> The L1 and L2 cache issues do not sound convincing to me. The caches are
>> just up to some Mbs and the models you work with are gigabytes. There will
>> always be cache misses in this setting. The only issue I can think of is if
>> the data is not fully pre-loaded into RAM then you have a cold-run but not
>> more than that.
>>
>> I think if you finish the runs and then could plot the results, then we
>> could see the clearer picture...
>>
>> Thanks again!
>>
>> Kind regards,
>>
>> Ivan
>>
>>
>> On Tue, Apr 4, 2017 at 11:40 AM, Hieu Hoang <[email protected]> wrote:
>>
>>> Thanks Ivan,
>>>
>>> I'm running the experiments with my models, using the text-based
>>> phrase-table that you used. Experiments are still running, they may take a
>>> week to finish.
>>>
>>> However, preliminary results suggest good scalability with Moses2.
>>>    https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>>> My models are here for you to download and test yourself if you like:
>>>    http://statmt.org/~s0565741/download/for-ivan/
>>>
>>> Below are my thoughts on possible reasons why there are discrepencies in
>>> what we're seeing:
>>>    1. You may have parameters in your moses.ini which are vastly
>>> different from mine and suboptimal for speed and scability. We won't know
>>> until we compare our 2 setups
>>>    2. Yours and mine phrase-tables are vastly different sizes. Your
>>> phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
>>> phrase-table in our AMTA paper and also got good scalability, but have I
>>> not tried one as small as yours. There may be phenomena that causes Moses2
>>> to be bad with small models
>>>    3. You loaded all models into memory, I loaded the phrase-table into
>>> memory binary but had to use binary LM and reordering models. My models are
>>> too large to load into RAM (they take up more RAM than the file size
>>> suggest).
>>>   4. You may be also running our of RAM by loading everything into
>>> memory, causing disk swapping
>>>   5. Your test set (1788 sentences) is too small. My test set is 800,000
>>> sentences (5,847,726 tokens). The decoders rely on CPU caches (L1, L2 etc)
>>> for speed. There are also setup costs for each decoding thread (eg.
>>> creating memory pools in Moses2). If your experiment are over too quickly,
>>> you may be measuring the decoder in the 'warm-up lap' rather than when it's
>>> running at terminal velocity. Your quickest decoding experiments took 25
>>> sec, my quickest took 200 sec.
>>>   6. I think the way you exclude load time is unreliable. You exclude
>>> load time by subtracting the average load time from the total time.
>>> However, load time is multiple times more than decoding time so any
>>> variation in load time will swamp the decoding time. I use the decoder's
>>> debugging timing output.
>>>
>>> If you can share your models, we might be able to find out the reason
>>> for the difference in our results. I can provide you with ssh/scp access to
>>> my server if you need to.
>>>
>>>
>>>
>>> * Looking for MT/NLP opportunities *
>>> Hieu Hoang
>>> http://moses-smt.org/
>>>
>>>
>>> On 4 April 2017 at 09:00, Ivan Zapreev <[email protected]> wrote:
>>>
>>>> Dear Hieu,
>>>>
>>>> Please see the answers below.
>>>>
>>>> Can you clarify a few things for me.
>>>>>   1. How many sentences, words were in the test set you used to
>>>>> measure decoding speed? Are there many duplicate sentence - ie. did you
>>>>> create a large test set by concatenating the same small test set multiple
>>>>> times?
>>>>>
>>>>
>>>> We run experiments on the same MT04 Chinese text as we tuned the
>>>> system. The text consists of 1788 unique sentences and 49582 tokens.
>>>>
>>>>
>>>>>   2. Are the model sizes you quoted the gzipped text files or
>>>>> unzipped, or the model size as it is when loaded into memory?
>>>>>
>>>>
>>>> This is the plain text models as stored on the hard drive.
>>>>
>>>>
>>>>>   3. Can you please reformat this graph
>>>>>         https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>>>>> ture/blob/master/doc/images/experiments/servers/stats.time.t
>>>>> ools.log.png
>>>>>       as #threads v. words per second. Ie. don't use log, don't use
>>>>> decoding time.
>>>>>
>>>>
>>>> The plot is attached, but this one is not about words per second, it
>>>> shows the decoding run-times (as in the link you sent). The non-log scale
>>>> plot, as you will see, is hard to read. I also attach the plain data files
>>>> for moses and moses2 with the column values as follows:
>>>>
>>>> number of threads | average runtime decoding + model loading | standard
>>>> deviation | average runtime model loading | standard deviation
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Ivan
>>>> <http://www.tainichok.ru/>
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Ivan
>> <http://www.tainichok.ru/>
>>
>
>


-- 
Best regards,

Ivan
<http://www.tainichok.ru/>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to