I think it'd be really valuable if we could be able to repeat the same
tests (given parallel corpus is available) in the future, any chance you
can share script / code to do that ? We may even consider adding a Jenkins
job dedicated to continuously monitor performances as we work on Joshua
master branch.


Anyway thanks for sharing the very interesting comparisons.

> Ugh, I think the mailing list deleted the attachment. Here is an attempt
> around our censors:
> https://www.dropbox.com/s/80up63reu4q809y/ar-en-joshua-moses2.png?dl=0
> > One thing we did this week at MT Marathon was a speed comparison of
> Joshua 6.1 (release candidate) with Moses2, which is a ground-up rewrite of
> Moses designed for speed (see the attached paper). Moses2 is 4–6x faster
> than Moses phrase-based, and 100x (!) faster than Moses hiero.
> > I tested using two moderate-to-large sized datasets that Hieu Hoang
> (CC'd) provided me with: ar-en and ru-en. Timing results are from 10,000
> sentences in each corpus. The average ar-en sentence length is 7.5, and for
> ru-en is 28. I only ran one test for each language, so there could be some
> variance if I averaged, but I think the results look pretty consistent. The
> timing is end-to-end (including model load times, which Moses2 tends to be
> a bit faster at).
> >
> > Note also that Joshua does not have lexicalized distortion, while Moses2
> does. This means the BLEU scores are a bit lower for Joshua: 62.85 versus
> 63.49. This shouldn't really affect runtime, however.
> >
> > I'm working on the ru-en, but here are the ar-en results:
> >
> > Some conclusions:
> >
> > - Hieu has done some bang-up work on the Moses2 rewrite! Joshua is in
> general about 3x slower than Moses2
> >
> > - We don't have a Moses comparison, but extrapolating from Hieu's paper,
> it seems we might be as fast as or faster than Moses phrase-based decoding,
> and are a ton faster on Hiero. I'm going to send my models to Hieu so he
> can test on his machine, and then we'll have a better feel for this,
> including how it scales on a machine with many more processors.
