Il giorno sab 17 set 2016 alle ore 15:23 Matt Post <[email protected]> ha scritto:
> I'll ask Hieu; I don't anticipate any problems. One potential problem is > that that models occupy about 15--20 GB; do you think Jenkins would host > this? > I'm not sure, can such models be downloaded and pruned at runtime, or do they need to exist on the Jenkins machine ? > > (ru-en grammars still packing, results will probably not be in until much > later today) > > matt > > > > On Sep 17, 2016, at 3:19 PM, Tommaso Teofili <[email protected]> > wrote: > > > > Hi Matt, > > > > I think it'd be really valuable if we could be able to repeat the same > > tests (given parallel corpus is available) in the future, any chance you > > can share script / code to do that ? We may even consider adding a > Jenkins > > job dedicated to continuously monitor performances as we work on Joshua > > master branch. > > > > WDYT? > > > > Anyway thanks for sharing the very interesting comparisons. > > Regards, > > Tommaso > > > > Il giorno sab 17 set 2016 alle ore 12:29 Matt Post <[email protected]> ha > > scritto: > > > >> Ugh, I think the mailing list deleted the attachment. Here is an attempt > >> around our censors: > >> > >> https://www.dropbox.com/s/80up63reu4q809y/ar-en-joshua-moses2.png?dl=0 > >> > >> > >>> On Sep 17, 2016, at 12:21 PM, Matt Post <[email protected]> wrote: > >>> > >>> Hi everyone, > >>> > >>> One thing we did this week at MT Marathon was a speed comparison of > >> Joshua 6.1 (release candidate) with Moses2, which is a ground-up > rewrite of > >> Moses designed for speed (see the attached paper). Moses2 is 4–6x faster > >> than Moses phrase-based, and 100x (!) faster than Moses hiero. > >>> > >>> I tested using two moderate-to-large sized datasets that Hieu Hoang > >> (CC'd) provided me with: ar-en and ru-en. Timing results are from 10,000 > >> sentences in each corpus. The average ar-en sentence length is 7.5, and > for > >> ru-en is 28. I only ran one test for each language, so there could be > some > >> variance if I averaged, but I think the results look pretty consistent. > The > >> timing is end-to-end (including model load times, which Moses2 tends to > be > >> a bit faster at). > >>> > >>> Note also that Joshua does not have lexicalized distortion, while > Moses2 > >> does. This means the BLEU scores are a bit lower for Joshua: 62.85 > versus > >> 63.49. This shouldn't really affect runtime, however. > >>> > >>> I'm working on the ru-en, but here are the ar-en results: > >>> > >>> > >>> > >>> Some conclusions: > >>> > >>> - Hieu has done some bang-up work on the Moses2 rewrite! Joshua is in > >> general about 3x slower than Moses2 > >>> > >>> - We don't have a Moses comparison, but extrapolating from Hieu's > paper, > >> it seems we might be as fast as or faster than Moses phrase-based > decoding, > >> and are a ton faster on Hiero. I'm going to send my models to Hieu so he > >> can test on his machine, and then we'll have a better feel for this, > >> including how it scales on a machine with many more processors. > >>> > >>> matt > >>> > >>> > >> > >> > >
