Hi, yes, it is correct that step 1 is doing just the data preparation for GIZA++. The most time-consuming step is running mkcls to creake the classes for the relative distortion models.
-phi On Mon, Aug 31, 2009 at 4:39 PM, James Read<[email protected]> wrote: > Hi, > > does anyone know what step 1 of the moses training script does other > than produce the dictionaries and the numerical sentences that enable > GIZA++ to do its job. The reason I ask is that on my machine step 1 > takes just over 70 mins for en-fr Europarl corpus. > > My optimised version of data preparation and EM IBM Model 1 completes > is 121 seconds for five iterations of EM, that's just over 2 minutes. > Before publishing these results I just wanted to make sure there's > nothing I've missed about step 1 of the training process. Does it do > anything at all that influences GIZA++ other than preparing the > digital sentences? > > James > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
