Try using MGIZA: http://geek.kyloo.net/software/doku.php/mgiza:overview
On 06/15/11 04:51, Prasanth K wrote: > Hello All, > > I am conducting a series of experiments to build translation systems > using Moses in which the corpus of the current experiment is a subset of > the corpora used in the previous experiment. I have started with the > Europarl corpora and am likely to repeat this process about 20 times. > Unless I am mistaken, this is going to take me nearly a month and I am > looking for ways to speeden up the whole process. > > Is there any optimal way to run Giza++ on these different subsets of > data without having to run it again and again? > "I do not want to use the alignments obtained from running Giza++ on the > entire Europarl corpora, for the other experiments (by selecting the > alignment information from aligned.grow-final-and-diag for the sentences > in the subsets)." > > The order of the experiments does not matter, so the experiments can be > done on the smallest dataset followed by supersets of the previous > dataset, provided there is a way to modify the translation probabilities > from Giza++ using just the additional data alone and this does not > affect the performance of Giza++ in comparison to when Giza++ is run on > the corpus in stand-alone mode. > > Kindly let me know if there is some way to do this and I am missing it. > > - regards, > Prasanth > > > -- > "Theories have four stages of acceptance. i) this is worthless nonsense; > ii) this is an interesting, but perverse, point of view, iii) this is > true, but quite unimportant; iv) I always said so." > > --- J.B.S. Haldane > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
