How to get those from wikimedia dump? Also i need speech corpus since just started recording audio for my research.
On Fri, Jan 6, 2012 at 8:08 PM, JAGANADH G <[email protected]> wrote: > Pardon my ignorance. What do you mean by language model ? > > > > A language model is a statistical model which populate from a data set. > Here I think OP is taling about creating language model for Speech > Processing. N-Gram is a kind of language model > http://en.wikipedia.org/wiki/N-gram > > And by > > Tamil-corpus do you mean a large collection of tamil text ? > > > > Corpus in the context of Natural Language Processing is: > A large collection of text . > > There are different types of corpus such as Text Corpus, Speech Corpus, > Image corpus etc.. > > Here OP requires a text corpus. I think he can use the Tamil Wikipedia dump > as corpus for his research purpose. Or he can populate a corpus from > newspaper RSS feeds and Tamil blog feeds too. > > -- > ********************************** > JAGANADH G > http://jaganadhg.in > *ILUGCBE* > http://ilugcbe.org.in > _______________________________________________ > ILUGC Mailing List: > http://www.ae.iitm.ac.in/mailman/listinfo/ilugc > _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc
