Thanks, Rico! At the moment, I ended up just manually cat'ing the files,
but this should be very useful in future.

On Wed, May 20, 2015 at 11:40 AM, Rico Sennrich <[email protected]>
wrote:

> Lane Schwartz <dowobeha@...> writes:
>
> >
> > I have a number of distinct monolingual corpora. I've been training them
> as separate LMs. I now want to run a variant where they are all
> concatenated
> together, and then trained as a single LM. The EMS walkthrough says this
> should be possible
> (http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc19), but doesn't
> give the requisite syntax. What is the EMS syntax to do this?
> >
> > Thanks,
> > Lane
>
> Hi Lane,
>
> I tried to do solve the problem quickly on Monday, but that didn't turn out
> too well (see the next few commits fixing bugs with it). I was also unhappy
> that I couldn't have multiple CONCATENATED-LMs on the same corpus, or
> define
> which corpora to concatenate. This implementation solves that. Assume you
> have these two LMs defined:
>
>   [LM:parallelA]
>   raw-corpus = /some/path
>
>   [LM:parallelB]
>   raw-corpus = /some/path
>   order = 5
>
> we can have a second LM trained on the data of parallelA, but with
> different
> settings, like this:
>
>   [LM:parallelA2]
>
>   stripped-corpus = [LM:parallelA:stripped-corpus]
>   exclude-from-interpolation = true
>   order = 6
>
> [this was actually possible before, but I've added the property
> 'exclude-from-interpolation', which tells INTERPOLATED-LM to skip this LM.]
>
> If you want an LM on concatenated data, you can define it like this:
>
>   [LM:parallelAB]
>
>   concatenate-files = [LM:{parallelA,parallelB}:stripped-corpus]
>   exclude-from-interpolation = true
>
> finally, you can also use 'custom-training' train a language model that
> train-model.perl doesn't know about, like NPLM. You'll also have to define
> how the model should be added to the moses.ini:
>
>   [LM:parallelAB]
>
>   stripped-corpus = [LM:parallelAB:stripped-corpus]
>   custom-training = "my_training_script.sh -order 5 -some_setting 8"
>   config-feature-line = "NPLM path=/some/path order=5 some-setting=8"
>   config-weight-line = "NPLM0= 0.1"
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to