Re: [Moses-support] Updating a general corpus?

Taylor Rose Wed, 21 Mar 2012 09:44:35 -0700

Rico,

Thanks for the information. I think this is exactly the work flow I
need.


-- 
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
     Server: freenode


On Tue, 2012-03-20 at 09:12 +0000, Rico Sennrich wrote:
> Taylor Rose <trose@...> writes:
> 
> > 
> > Hello,
> > 
> > I'm trying to build a system where I can take a general speech corpus
> > for a given language pair and add specific information to it for certain
> > applications. For instance, I may want to add information about computer
> > technologies or biology terms to my general corpus. In this way I hope
> > to be able to easily create domain specific translation modules quickly.
> > My current system for doing this is quite slow and I have some ideas
> > about how to improve it but I'm not sure if they're possible.
> > 
> > My current system is as such: I keep a large generalized corpus stored
> > on my server. When I want to create a new translation module I combine a
> > smaller corpus with my large corpus and then go through the entire
> > cleaning and training process with moses. This ends up taking a very
> > long time when I'm really only adding a small amount of new information.
> > 
> > What I would like to do is to train the large corpus on it's own and
> > then modify it with the smaller domain specific corpora. I think this
> > would save me a huge amount of time since I would only ever have to
> > train the large corpus once. Is this sort of thing at all possible? If
> > moses keeps track of instance counts of n-grams I think this would be
> > trivial. I'm just not sure if it actually does that.
> > 
> > Thanks for any help or advice you can provide,
> 
> Hi Taylor,
> 
> Incremental training might be what you're looking for:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27
> 
> Alternatively, there's a script in /contrib/tmcombine that allows you to 
> perform
> a weighted combination of phrase tables. The general idea would be to train
> models on the individual corpora, then obtaining a combined model through the
> tmcombine script. 
> 
> Especially if you prune your phrase tables first ( /contrib/sigtest-filter ), 
> it
> saves you time over re-doing the whole training procedure. The script's main
> aim, however, is to give you better SMT results by optimizing the weights of
> each of the models that you combine.
> 
> best wishes,
> Rico
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Updating a general corpus?

Reply via email to