Lane Schwartz <dowobeha@...> writes:

> 
> I have a number of distinct monolingual corpora. I've been training them
as separate LMs. I now want to run a variant where they are all concatenated
together, and then trained as a single LM. The EMS walkthrough says this
should be possible
(http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc19), but doesn't
give the requisite syntax. What is the EMS syntax to do this?
> 
> Thanks,
> Lane

Hi Lane,

I tried to do solve the problem quickly on Monday, but that didn't turn out
too well (see the next few commits fixing bugs with it). I was also unhappy
that I couldn't have multiple CONCATENATED-LMs on the same corpus, or define
which corpora to concatenate. This implementation solves that. Assume you
have these two LMs defined:

  [LM:parallelA]
  raw-corpus = /some/path

  [LM:parallelB]
  raw-corpus = /some/path
  order = 5

we can have a second LM trained on the data of parallelA, but with different
settings, like this:
    
  [LM:parallelA2]
    
  stripped-corpus = [LM:parallelA:stripped-corpus]
  exclude-from-interpolation = true
  order = 6

[this was actually possible before, but I've added the property
'exclude-from-interpolation', which tells INTERPOLATED-LM to skip this LM.]

If you want an LM on concatenated data, you can define it like this:
    
  [LM:parallelAB]
    
  concatenate-files = [LM:{parallelA,parallelB}:stripped-corpus]
  exclude-from-interpolation = true

finally, you can also use 'custom-training' train a language model that
train-model.perl doesn't know about, like NPLM. You'll also have to define
how the model should be added to the moses.ini:

  [LM:parallelAB]
 
  stripped-corpus = [LM:parallelAB:stripped-corpus]
  custom-training = "my_training_script.sh -order 5 -some_setting 8"
  config-feature-line = "NPLM path=/some/path order=5 some-setting=8"
  config-weight-line = "NPLM0= 0.1"



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to