Hi Doren, I've used SRILM to generate POS LMs. The LM, as you might expect, needs to be training on a corpus consisting of sequences of POSes instead of sequences of surface forms, e.g. instead of
The cat sat on the mat the corpus should contain DET N V P DET N or whatever. Furthermore, the set of POSes is probably small as vocabularies go, so smoothing methods that rely on counts-of-counts, such as Kneser-Ney, are inappropriate. The SRILM website's FAQ<http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html>recommends Witten-Bell discounting (command line option '-wbdiscount') for such cases. (See question C3, answer (b) at the FAQ.) Also because the vocabulary is small, you can get away with using higher-order n-grams than you would use for a surface LM. Other than that, it's the same as preparing a surface LM. Regards, Ben
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
