Hi Doren,

I've used SRILM to generate POS LMs.  The LM, as you might expect, needs to
be training on a corpus consisting of sequences of POSes instead of
sequences of surface forms, e.g. instead of

 The cat sat on the mat

the corpus should contain

 DET N V P DET N

or whatever.

Furthermore, the set of POSes is probably small as vocabularies go, so
smoothing methods that rely on counts-of-counts, such as Kneser-Ney, are
inappropriate.  The SRILM website's
FAQ<http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html>recommends
Witten-Bell discounting (command line option '-wbdiscount') for
such cases.  (See question C3, answer (b) at the FAQ.)

Also because the vocabulary is small, you can get away with using
higher-order n-grams than you would use for a surface LM.

Other than that, it's the same as preparing a surface LM.

Regards,
Ben
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to