Pardon my ignorance. What do you mean by language model ?
A language model is a statistical model which populate from a data set. Here I think OP is taling about creating language model for Speech Processing. N-Gram is a kind of language model http://en.wikipedia.org/wiki/N-gram And by > Tamil-corpus do you mean a large collection of tamil text ? > Corpus in the context of Natural Language Processing is: A large collection of text . There are different types of corpus such as Text Corpus, Speech Corpus, Image corpus etc.. Here OP requires a text corpus. I think he can use the Tamil Wikipedia dump as corpus for his research purpose. Or he can populate a corpus from newspaper RSS feeds and Tamil blog feeds too. -- ********************************** JAGANADH G http://jaganadhg.in *ILUGCBE* http://ilugcbe.org.in _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc
