Hi, Quoting Philipp Koehn <[email protected]>:
> Hi, > > yes, it is correct that step 1 is doing just the data preparation for GIZA++. > The most time-consuming step is running mkcls to creake the classes > for the relative distortion models. > Do you mean the *.vcb files that are created in Step 1? These just look like dictionary files with 3 fields a) a numeric ID, b) the word entry, c) the frequency of the string. My make_dictionary function does this in about 20 seconds. Why is mkcls taking so long? Is it doing something complicated that I have missed here? James > -phi > > On Mon, Aug 31, 2009 at 4:39 PM, James Read<[email protected]> wrote: >> Hi, >> >> does anyone know what step 1 of the moses training script does other >> than produce the dictionaries and the numerical sentences that enable >> GIZA++ to do its job. The reason I ask is that on my machine step 1 >> takes just over 70 mins for en-fr Europarl corpus. >> >> My optimised version of data preparation and EM IBM Model 1 completes >> is 121 seconds for five iterations of EM, that's just over 2 minutes. >> Before publishing these results I just wanted to make sure there's >> nothing I've missed about step 1 of the training process. Does it do >> anything at all that influences GIZA++ other than preparing the >> digital sentences? >> >> James >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
