Great. I did manage to locate a 64-core machine, and have the following results for the ar-en dataset from Hieu: (phrase based: single run; hiero: best of three runs):
Threads Joshua Moses2 Joshua (hiero) Moses2 (hiero) Phrase rate Hiero rate 1 306 125 3149 1313 2.45 2.40 2 175 76 1643 700 2.30 2.35 4 123 52 914 369 2.37 2.48 8 103 40 508 199 2.58 2.55 16 83 34 321 114 2.44 2.82 32 93 31 247 71 3.00 3.48 48 84 30 262 62 2.80 4.23 So Joshua does benefit from more threads, but seems to be bottoming out faster. The speed ratio between the systems is roughly 2–3x in favor of Moses2. matt > On Oct 11, 2016, at 11:22 AM, Tommaso Teofili <[email protected]> > wrote: > > thanks Matt, I'll try it out. > > Regards, > Tommaso > > Il giorno lun 10 ott 2016 alle ore 16:49 Matt Post <[email protected]> ha > scritto: > >> Not stupid! You can use the shell script I bundled up. Here's how I ran >> the timing tests. >> >> for n in 64 48 32 16 8 4 2 1; do >> for name in moses2 joshua; do >> echo $name $n; bash time-$name.sh > out.$name.$n >> 2> log.$name.$n >> done >> done >> >> matt >> >> >>> On Oct 10, 2016, at 6:42 AM, Tommaso Teofili <[email protected]> >> wrote: >>> >>> sorry if this is again a stupid question, but I'm still getting my head >>> around all the possible execution options, now that I've downloaded the >>> above models, which scripts should i use to run/evaluate them for the >>> comparison to be consistent with what others did ? >>> >>> Regards, >>> Tommaso >>> >>> Il giorno gio 6 ott 2016 alle ore 18:13 Mattmann, Chris A (3980) < >>> [email protected]> ha scritto: >>> >>>> Here here, great job and thanks for hosting >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Principal Data Scientist, Engineering Administrative Office (3010) >>>> Manager, Open Source Projects Formulation and Development Office (8212) >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 168-519, Mailstop: 168-527 >>>> Email: [email protected] >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Director, Information Retrieval and Data Science Group (IRDS) >>>> Adjunct Associate Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> WWW: http://irds.usc.edu/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> On 10/6/16, 12:49 AM, "kellen sunderland" <[email protected]> >>>> wrote: >>>> >>>> Will do, but it might be a few days before I get the time to do a >>>> proper >>>> test. Thanks for hosting Matt. >>>> >>>> On Thu, Oct 6, 2016 at 2:19 AM, Matt Post <[email protected]> wrote: >>>> >>>>> Hi folks, >>>>> >>>>> Sorry this took so long, long story. But the four models that Hieu >>>> shared >>>>> with me are ready. You can download them here; they're each about >>>> 15–20 GB. >>>>> >>>>> http://cs.jhu.edu/~post/files/joshua-hiero-ar-en.tbz >>>>> http://cs.jhu.edu/~post/files/joshua-phrase-ar-en.tbz >>>>> http://cs.jhu.edu/~post/files/joshua-hiero-ru-en.tbz >>>>> http://cs.jhu.edu/~post/files/joshua-hiero-ru-en.tbz >>>>> >>>>> It'd be great if someone could test them on a machine with lots of >>>> cores, >>>>> to see how things scale. >>>>> >>>>> matt >>>>> >>>>> On Sep 22, 2016, at 9:09 AM, Matt Post <[email protected]> wrote: >>>>> >>>>> Hi folks, >>>>> >>>>> I have finished the comparison. Here you can find graphs for ar-en >>>> and >>>>> ru-en. The ground-up rewrite of Moses is >>>>> about 2x–3x faster than Joshua. >>>>> >>>>> http://imgur.com/a/FcIbW >>>>> >>>>> One implication (untested) is that we are likely as fast as or >>>> faster than >>>>> Moses. >>>>> >>>>> We could brainstorm things to do to close this gap. I'd be much >>>> happier >>>>> with 2x or even 1.5x than with 3x, and I bet we could narrow this >>>> down. But >>>>> I'd like to get the 6.1 release out of the way, first, so I'm >>>> pushing this >>>>> off to next month. Sound cool? >>>>> >>>>> matt >>>>> >>>>> >>>>> On Sep 19, 2016, at 6:26 AM, Matt Post <[email protected]> wrote: >>>>> >>>>> I can't believe I did this, but I mis-colored one of the hiero >>>> lines, and >>>>> the Numbers legend doesn't show the line type. If you reload the >>>> dropbox >>>>> file, it's fixed now. The difference is about 3x for both. Here's >>>> the table. >>>>> >>>>> Threads >>>>> Joshua >>>>> Moses2 >>>>> Joshua (hiero) >>>>> Moses2 (hiero) >>>>> Phrase rate >>>>> Hiero rate >>>>> 1 >>>>> 178 >>>>> 65 >>>>> 2116 >>>>> 1137 >>>>> 2.74 >>>>> 1.86 >>>>> 2 >>>>> 109 >>>>> 42 >>>>> 1014 >>>>> 389 >>>>> 2.60 >>>>> 2.61 >>>>> 4 >>>>> 78 >>>>> 29 >>>>> 596 >>>>> 213 >>>>> 2.69 >>>>> 2.80 >>>>> 6 >>>>> 72 >>>>> 25 >>>>> 473 >>>>> 154 >>>>> 2.88 >>>>> 3.07 >>>>> >>>>> I'll put the models together and share them later today. This was on >>>> a >>>>> 6-core machine and I agree it'd be nice to test with something much >>>> higher. >>>>> >>>>> matt >>>>> >>>>> >>>>> On Sep 19, 2016, at 5:33 AM, kellen sunderland < >>>>> [email protected]<mailto:[email protected] >>>>> <[email protected]>>> wrote: >>>>> >>>>> Do we just want to store these models somewhere temporarily? I've >>>> got a >>>>> OneDrive account and could share the models from there (as long as >>>> they're >>>>> below 500GBs or so). >>>>> >>>>> On Mon, Sep 19, 2016 at 11:32 AM, kellen sunderland < >>>>> [email protected] <mailto:[email protected] >>>>> <[email protected]>>> wrote: >>>>> Very nice results. I think getting to within 25% of a optimized c++ >>>>> decoder from a Java decoder is impressive. Great that Hieu has put >>>> in the >>>>> work to make moses2 so fast as well, that gives organizations two >>>> quite >>>>> nice decoding engines to choose from, both with reasonable >>>> performance. >>>>> >>>>> Matt: I had a question about the x axis here. Is that number of >>>> threads? >>>>> We should be scaling more or less linearly with the number of >>>> threads, is >>>>> that the case here? If you post the models somewhere I can also do >>>> a quick >>>>> benchmark on a machine with a few more cores. >>>>> >>>>> -Kellen >>>>> >>>>> >>>>> On Mon, Sep 19, 2016 at 10:53 AM, Tommaso Teofili < >>>>> [email protected]<mailto:[email protected] >>>>> <[email protected]>>> wrote: >>>>> Il giorno sab 17 set 2016 alle ore 15:23 Matt Post <[email protected]< >>>>> mailto:[email protected] <[email protected]>>> ha >>>>> scritto: >>>>> >>>>> I'll ask Hieu; I don't anticipate any problems. One potential >>>> problem is >>>>> that that models occupy about 15--20 GB; do you think Jenkins would >>>> host >>>>> this? >>>>> >>>>> >>>>> I'm not sure, can such models be downloaded and pruned at runtime, >>>> or do >>>>> they need to exist on the Jenkins machine ? >>>>> >>>>> >>>>> >>>>> (ru-en grammars still packing, results will probably not be in until >>>> much >>>>> later today) >>>>> >>>>> matt >>>>> >>>>> >>>>> On Sep 17, 2016, at 3:19 PM, Tommaso Teofili < >>>> [email protected]< >>>>> mailto:[email protected] <[email protected]>>> >>>>> >>>>> wrote: >>>>> >>>>> >>>>> Hi Matt, >>>>> >>>>> I think it'd be really valuable if we could be able to repeat the >>>> same >>>>> tests (given parallel corpus is available) in the future, any chance >>>> you >>>>> can share script / code to do that ? We may even consider adding a >>>>> >>>>> Jenkins >>>>> >>>>> job dedicated to continuously monitor performances as we work on >>>> Joshua >>>>> master branch. >>>>> >>>>> WDYT? >>>>> >>>>> Anyway thanks for sharing the very interesting comparisons. >>>>> Regards, >>>>> Tommaso >>>>> >>>>> Il giorno sab 17 set 2016 alle ore 12:29 Matt Post <[email protected]< >>>>> mailto:[email protected] <[email protected]>>> ha >>>>> scritto: >>>>> >>>>> Ugh, I think the mailing list deleted the attachment. Here is an >>>> attempt >>>>> around our censors: >>>>> >>>>> >>>> https://www.dropbox.com/s/80up63reu4q809y/ar-en-joshua-moses2.png?dl=0< >>>>> >>>> https://www.dropbox.com/s/80up63reu4q809y/ar-en-joshua-moses2.png?dl=0> >>>>> >>>>> >>>>> On Sep 17, 2016, at 12:21 PM, Matt Post <[email protected]<mailto: >>>> post@ >>>>> cs.jhu.edu <[email protected]>>> wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> One thing we did this week at MT Marathon was a speed comparison of >>>>> >>>>> Joshua 6.1 (release candidate) with Moses2, which is a ground-up >>>>> >>>>> rewrite of >>>>> >>>>> Moses designed for speed (see the attached paper). Moses2 is 4–6x >>>> faster >>>>> than Moses phrase-based, and 100x (!) faster than Moses hiero. >>>>> >>>>> >>>>> I tested using two moderate-to-large sized datasets that Hieu Hoang >>>>> >>>>> (CC'd) provided me with: ar-en and ru-en. Timing results are from >>>> 10,000 >>>>> sentences in each corpus. The average ar-en sentence length is 7.5, >>>> and >>>>> >>>>> for >>>>> >>>>> ru-en is 28. I only ran one test for each language, so there could be >>>>> >>>>> some >>>>> >>>>> variance if I averaged, but I think the results look pretty >>>> consistent. >>>>> >>>>> The >>>>> >>>>> timing is end-to-end (including model load times, which Moses2 tends >>>> to >>>>> >>>>> be >>>>> >>>>> a bit faster at). >>>>> >>>>> >>>>> Note also that Joshua does not have lexicalized distortion, while >>>>> >>>>> Moses2 >>>>> >>>>> does. This means the BLEU scores are a bit lower for Joshua: 62.85 >>>>> >>>>> versus >>>>> >>>>> 63.49. This shouldn't really affect runtime, however. >>>>> >>>>> >>>>> I'm working on the ru-en, but here are the ar-en results: >>>>> >>>>> >>>>> >>>>> Some conclusions: >>>>> >>>>> - Hieu has done some bang-up work on the Moses2 rewrite! Joshua is in >>>>> >>>>> general about 3x slower than Moses2 >>>>> >>>>> >>>>> - We don't have a Moses comparison, but extrapolating from Hieu's >>>>> >>>>> paper, >>>>> >>>>> it seems we might be as fast as or faster than Moses phrase-based >>>>> >>>>> decoding, >>>>> >>>>> and are a ton faster on Hiero. I'm going to send my models to Hieu >>>> so he >>>>> can test on his machine, and then we'll have a better feel for this, >>>>> including how it scales on a machine with many more processors. >>>>> >>>>> >>>>> matt >>>>> >>>>> >>>>> >>>> >>>> >>>> >> >>
