it is this: > Abby Levenberg, Chris Callison-Burch and Miles Osborne. Stream-based Translation Models for Statistical Machine Translation<http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf>. NAACL, Los Angeles, USA, 2010.
http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf Miles On 15 June 2011 19:28, Qin Gao <[email protected]> wrote: > Yes, MGIZA isn't really "incrementally training", it only initialize the > model parameters with that trained previously, since it does not store > sufficient statistics of the previous training. It will give bad performance > if > > 1. You train only model 1 or > 2. The incremental data or sub set is really small > > It is more suitable for the following scenario: > > You train a model on corpus A, and have new data B, you want to train > several iterations of model 4 on A+B. > > For incremental training giza, do you know does it use online EM (as in > Liang and Klein 2009) or just storing the sufficient statistics of previous > training? > --Q > > > > On Wed, Jun 15, 2011 at 11:07 AM, Miles Osborne <[email protected]>wrote: > >> that isn't the expected answer here. i think the OP wants some kind of >> incremental (re) training. >> >> firstly: it is not really possible to guarantee that performance is not >> degraded when running from subsets up to the full set (compared with just >> running it on the full set). >> >> secondly, you may wish to investigate a version of Giza which supports >> incremental retraining. this would allow you to train on a subset and then >> add more and more data, without retraining at each point from scratch. the >> current version has minimal documentation, but right now this is hopefully >> being fixed. if you are feeling brave, look here: >> >> http://code.google.com/p/inc-giza-pp/ >> >> Miles >> >> >> On 15 June 2011 18:50, Kenneth Heafield <[email protected]> wrote: >> >>> Try using MGIZA: http://geek.kyloo.net/software/doku.php/mgiza:overview >>> >>> On 06/15/11 04:51, Prasanth K wrote: >>> > Hello All, >>> > >>> > I am conducting a series of experiments to build translation systems >>> > using Moses in which the corpus of the current experiment is a subset >>> of >>> > the corpora used in the previous experiment. I have started with the >>> > Europarl corpora and am likely to repeat this process about 20 times. >>> > Unless I am mistaken, this is going to take me nearly a month and I am >>> > looking for ways to speeden up the whole process. >>> > >>> > Is there any optimal way to run Giza++ on these different subsets of >>> > data without having to run it again and again? >>> > "I do not want to use the alignments obtained from running Giza++ on >>> the >>> > entire Europarl corpora, for the other experiments (by selecting the >>> > alignment information from aligned.grow-final-and-diag for the >>> sentences >>> > in the subsets)." >>> > >>> > The order of the experiments does not matter, so the experiments can be >>> > done on the smallest dataset followed by supersets of the previous >>> > dataset, provided there is a way to modify the translation >>> probabilities >>> > from Giza++ using just the additional data alone and this does not >>> > affect the performance of Giza++ in comparison to when Giza++ is run on >>> > the corpus in stand-alone mode. >>> > >>> > Kindly let me know if there is some way to do this and I am missing it. >>> > >>> > - regards, >>> > Prasanth >>> > >>> > >>> > -- >>> > "Theories have four stages of acceptance. i) this is worthless >>> nonsense; >>> > ii) this is an interesting, but perverse, point of view, iii) this is >>> > true, but quite unimportant; iv) I always said so." >>> > >>> > --- J.B.S. Haldane >>> > >>> > >>> > >>> > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
