Hi,

I have recently been trying to create incremental adapted language models
using IRSTLM.

I have a in-domain data set on which the mixture adapted weights are
computed using the -lm=mix option and i have a larger out-domain dataset
from which i incrementally add data to create adapted LMs of different size.

Currently, every time saveBIN is called, the entire lmtable is estimated
and saved which makes the process slow...

Is there a functionality in IRSTLM to incrementally train/save adapted
Language models?

Secondly, given a existing adapted language model in ARPA format (old), and
another small language model built on incremental data (new),

would it be safe to update the smoothed probabilities (fstar) using the
following formula:
c_sum(wh) = c_old(wh) + c_new(wh)
f*_old(w|h)*(c_old(wh)/c_sum(wh)) + f*_new(w|h)*(c_new(wh)/c_sum(wh))

where the c_old and c_new counts are estimated from the ngram tables?


Thanks and Regards,

Pratyush
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to