Re: [Ilugc] Tamil Corpus

Kartik Raj Fri, 06 Jan 2012 06:44:35 -0800

How to get those from wikimedia dump? Also i need speech corpus since just
started recording audio for my research.


On Fri, Jan 6, 2012 at 8:08 PM, JAGANADH G <[email protected]> wrote:

> Pardon my ignorance. What do you mean by language model ?
>
>
>
> A language model is a statistical model which populate from a data set.
> Here I think OP is taling about creating language model for Speech
> Processing. N-Gram is a kind of language model
> http://en.wikipedia.org/wiki/N-gram
>
> And by
> > Tamil-corpus do you mean a large collection of tamil text ?
> >
>
> Corpus in the context of Natural Language Processing is:
> A large collection of text .
>
> There are different types of corpus such as Text Corpus, Speech Corpus,
> Image corpus etc..
>
> Here OP requires a text corpus. I think he can use the Tamil Wikipedia dump
> as corpus for his research purpose. Or he can populate a corpus from
> newspaper RSS feeds and Tamil blog feeds too.
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
> _______________________________________________
> ILUGC Mailing List:
> http://www.ae.iitm.ac.in/mailman/listinfo/ilugc
>
_______________________________________________
ILUGC Mailing List:
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Re: [Ilugc] Tamil Corpus

Reply via email to