Taylor Rose <trose@...> writes: > > Hello, > > I'm trying to build a system where I can take a general speech corpus > for a given language pair and add specific information to it for certain > applications. For instance, I may want to add information about computer > technologies or biology terms to my general corpus. In this way I hope > to be able to easily create domain specific translation modules quickly. > My current system for doing this is quite slow and I have some ideas > about how to improve it but I'm not sure if they're possible. > > My current system is as such: I keep a large generalized corpus stored > on my server. When I want to create a new translation module I combine a > smaller corpus with my large corpus and then go through the entire > cleaning and training process with moses. This ends up taking a very > long time when I'm really only adding a small amount of new information. > > What I would like to do is to train the large corpus on it's own and > then modify it with the smaller domain specific corpora. I think this > would save me a huge amount of time since I would only ever have to > train the large corpus once. Is this sort of thing at all possible? If > moses keeps track of instance counts of n-grams I think this would be > trivial. I'm just not sure if it actually does that. > > Thanks for any help or advice you can provide,
Hi Taylor, Incremental training might be what you're looking for: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27 Alternatively, there's a script in /contrib/tmcombine that allows you to perform a weighted combination of phrase tables. The general idea would be to train models on the individual corpora, then obtaining a combined model through the tmcombine script. Especially if you prune your phrase tables first ( /contrib/sigtest-filter ), it saves you time over re-doing the whole training procedure. The script's main aim, however, is to give you better SMT results by optimizing the weights of each of the models that you combine. best wishes, Rico _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
