Re: moses2 vs. joshua

Tommaso Teofili Tue, 11 Oct 2016 08:22:58 -0700

thanks Matt, I'll try it out.

Regards,
Tommaso


Il giorno lun 10 ott 2016 alle ore 16:49 Matt Post <[email protected]> ha
scritto:

> Not stupid! You can use the shell script I bundled up. Here's how I ran
> the timing tests.
>
>         for n in 64 48 32 16 8 4 2 1; do
>                 for name in moses2 joshua; do
>                         echo $name $n; bash time-$name.sh > out.$name.$n
> 2> log.$name.$n
>                 done
>         done
>
> matt
>
>
> > On Oct 10, 2016, at 6:42 AM, Tommaso Teofili <[email protected]>
> wrote:
> >
> > sorry if this is again a stupid question, but I'm still getting my head
> > around all the possible execution options, now that I've downloaded the
> > above models, which scripts should i use to run/evaluate them for the
> > comparison to be consistent with what others did ?
> >
> > Regards,
> > Tommaso
> >
> > Il giorno gio 6 ott 2016 alle ore 18:13 Mattmann, Chris A (3980) <
> > [email protected]> ha scritto:
> >
> >> Here here, great job and thanks for hosting
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Principal Data Scientist, Engineering Administrative Office (3010)
> >> Manager, Open Source Projects Formulation and Development Office (8212)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: [email protected]
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Director, Information Retrieval and Data Science Group (IRDS)
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> WWW: http://irds.usc.edu/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >> On 10/6/16, 12:49 AM, "kellen sunderland" <[email protected]>
> >> wrote:
> >>
> >>    Will do, but it might be a few days before I get the time to do a
> >> proper
> >>    test.  Thanks for hosting Matt.
> >>
> >>    On Thu, Oct 6, 2016 at 2:19 AM, Matt Post <[email protected]> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> Sorry this took so long, long story. But the four models that Hieu
> >> shared
> >>> with me are ready. You can download them here; they're each about
> >> 15–20 GB.
> >>>
> >>>  http://cs.jhu.edu/~post/files/joshua-hiero-ar-en.tbz
> >>>  http://cs.jhu.edu/~post/files/joshua-phrase-ar-en.tbz
> >>>  http://cs.jhu.edu/~post/files/joshua-hiero-ru-en.tbz
> >>>  http://cs.jhu.edu/~post/files/joshua-hiero-ru-en.tbz
> >>>
> >>> It'd be great if someone could test them on a machine with lots of
> >> cores,
> >>> to see how things scale.
> >>>
> >>> matt
> >>>
> >>> On Sep 22, 2016, at 9:09 AM, Matt Post <[email protected]> wrote:
> >>>
> >>> Hi folks,
> >>>
> >>> I have finished the comparison. Here you can find graphs for ar-en
> >> and
> >>> ru-en. The ground-up rewrite of Moses is
> >>> about 2x–3x faster than Joshua.
> >>>
> >>> http://imgur.com/a/FcIbW
> >>>
> >>> One implication (untested) is that we are likely as fast as or
> >> faster than
> >>> Moses.
> >>>
> >>> We could brainstorm things to do to close this gap. I'd be much
> >> happier
> >>> with 2x or even 1.5x than with 3x, and I bet we could narrow this
> >> down. But
> >>> I'd like to get the 6.1 release out of the way, first, so I'm
> >> pushing this
> >>> off to next month. Sound cool?
> >>>
> >>> matt
> >>>
> >>>
> >>> On Sep 19, 2016, at 6:26 AM, Matt Post <[email protected]> wrote:
> >>>
> >>> I can't believe I did this, but I mis-colored one of the hiero
> >> lines, and
> >>> the Numbers legend doesn't show the line type. If you reload the
> >> dropbox
> >>> file, it's fixed now. The difference is about 3x for both. Here's
> >> the table.
> >>>
> >>> Threads
> >>> Joshua
> >>> Moses2
> >>> Joshua (hiero)
> >>> Moses2 (hiero)
> >>> Phrase rate
> >>> Hiero rate
> >>> 1
> >>> 178
> >>> 65
> >>> 2116
> >>> 1137
> >>> 2.74
> >>> 1.86
> >>> 2
> >>> 109
> >>> 42
> >>> 1014
> >>> 389
> >>> 2.60
> >>> 2.61
> >>> 4
> >>> 78
> >>> 29
> >>> 596
> >>> 213
> >>> 2.69
> >>> 2.80
> >>> 6
> >>> 72
> >>> 25
> >>> 473
> >>> 154
> >>> 2.88
> >>> 3.07
> >>>
> >>> I'll put the models together and share them later today. This was on
> >> a
> >>> 6-core machine and I agree it'd be nice to test with something much
> >> higher.
> >>>
> >>> matt
> >>>
> >>>
> >>> On Sep 19, 2016, at 5:33 AM, kellen sunderland <
> >>> [email protected]<mailto:[email protected]
> >>> <[email protected]>>> wrote:
> >>>
> >>> Do we just want to store these models somewhere temporarily?  I've
> >> got a
> >>> OneDrive account and could share the models from there (as long as
> >> they're
> >>> below 500GBs or so).
> >>>
> >>> On Mon, Sep 19, 2016 at 11:32 AM, kellen sunderland <
> >>> [email protected] <mailto:[email protected]
> >>> <[email protected]>>> wrote:
> >>> Very nice results.  I think getting to within 25% of a optimized c++
> >>> decoder from a Java decoder is impressive.  Great that Hieu has put
> >> in the
> >>> work to make moses2 so fast as well, that gives organizations two
> >> quite
> >>> nice decoding engines to choose from, both with reasonable
> >> performance.
> >>>
> >>> Matt: I had a question about the x axis here.  Is that number of
> >> threads?
> >>> We should be scaling more or less linearly with the number of
> >> threads, is
> >>> that the case here?  If you post the models somewhere I can also do
> >> a quick
> >>> benchmark on a machine with a few more cores.
> >>>
> >>> -Kellen
> >>>
> >>>
> >>> On Mon, Sep 19, 2016 at 10:53 AM, Tommaso Teofili <
> >>> [email protected]<mailto:[email protected]
> >>> <[email protected]>>> wrote:
> >>> Il giorno sab 17 set 2016 alle ore 15:23 Matt Post <[email protected]<
> >>> mailto:[email protected] <[email protected]>>> ha
> >>> scritto:
> >>>
> >>> I'll ask Hieu; I don't anticipate any problems. One potential
> >> problem is
> >>> that that models occupy about 15--20 GB; do you think Jenkins would
> >> host
> >>> this?
> >>>
> >>>
> >>> I'm not sure, can such models be downloaded and pruned at runtime,
> >> or do
> >>> they need to exist on the Jenkins machine ?
> >>>
> >>>
> >>>
> >>> (ru-en grammars still packing, results will probably not be in until
> >> much
> >>> later today)
> >>>
> >>> matt
> >>>
> >>>
> >>> On Sep 17, 2016, at 3:19 PM, Tommaso Teofili <
> >> [email protected]<
> >>> mailto:[email protected] <[email protected]>>>
> >>>
> >>> wrote:
> >>>
> >>>
> >>> Hi Matt,
> >>>
> >>> I think it'd be really valuable if we could be able to repeat the
> >> same
> >>> tests (given parallel corpus is available) in the future, any chance
> >> you
> >>> can share script / code to do that ? We may even consider adding a
> >>>
> >>> Jenkins
> >>>
> >>> job dedicated to continuously monitor performances as we work on
> >> Joshua
> >>> master branch.
> >>>
> >>> WDYT?
> >>>
> >>> Anyway thanks for sharing the very interesting comparisons.
> >>> Regards,
> >>> Tommaso
> >>>
> >>> Il giorno sab 17 set 2016 alle ore 12:29 Matt Post <[email protected]<
> >>> mailto:[email protected] <[email protected]>>> ha
> >>> scritto:
> >>>
> >>> Ugh, I think the mailing list deleted the attachment. Here is an
> >> attempt
> >>> around our censors:
> >>>
> >>>
> >> https://www.dropbox.com/s/80up63reu4q809y/ar-en-joshua-moses2.png?dl=0<
> >>>
> >> https://www.dropbox.com/s/80up63reu4q809y/ar-en-joshua-moses2.png?dl=0>
> >>>
> >>>
> >>> On Sep 17, 2016, at 12:21 PM, Matt Post <[email protected]<mailto:
> >> post@
> >>> cs.jhu.edu <[email protected]>>> wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> One thing we did this week at MT Marathon was a speed comparison of
> >>>
> >>> Joshua 6.1 (release candidate) with Moses2, which is a ground-up
> >>>
> >>> rewrite of
> >>>
> >>> Moses designed for speed (see the attached paper). Moses2 is 4–6x
> >> faster
> >>> than Moses phrase-based, and 100x (!) faster than Moses hiero.
> >>>
> >>>
> >>> I tested using two moderate-to-large sized datasets that Hieu Hoang
> >>>
> >>> (CC'd) provided me with: ar-en and ru-en. Timing results are from
> >> 10,000
> >>> sentences in each corpus. The average ar-en sentence length is 7.5,
> >> and
> >>>
> >>> for
> >>>
> >>> ru-en is 28. I only ran one test for each language, so there could be
> >>>
> >>> some
> >>>
> >>> variance if I averaged, but I think the results look pretty
> >> consistent.
> >>>
> >>> The
> >>>
> >>> timing is end-to-end (including model load times, which Moses2 tends
> >> to
> >>>
> >>> be
> >>>
> >>> a bit faster at).
> >>>
> >>>
> >>> Note also that Joshua does not have lexicalized distortion, while
> >>>
> >>> Moses2
> >>>
> >>> does. This means the BLEU scores are a bit lower for Joshua: 62.85
> >>>
> >>> versus
> >>>
> >>> 63.49. This shouldn't really affect runtime, however.
> >>>
> >>>
> >>> I'm working on the ru-en, but here are the ar-en results:
> >>>
> >>>
> >>>
> >>> Some conclusions:
> >>>
> >>> - Hieu has done some bang-up work on the Moses2 rewrite! Joshua is in
> >>>
> >>> general about 3x slower than Moses2
> >>>
> >>>
> >>> - We don't have a Moses comparison, but extrapolating from Hieu's
> >>>
> >>> paper,
> >>>
> >>> it seems we might be as fast as or faster than Moses phrase-based
> >>>
> >>> decoding,
> >>>
> >>> and are a ton faster on Hiero. I'm going to send my models to Hieu
> >> so he
> >>> can test on his machine, and then we'll have a better feel for this,
> >>> including how it scales on a machine with many more processors.
> >>>
> >>>
> >>> matt
> >>>
> >>>
> >>>
> >>
> >>
> >>
>
>

Re: moses2 vs. joshua

Reply via email to