IMHO building a language model out of Tamil Wikipedia is a bad idea..
> It has lots of colloquial terms and modern/mixed words.. And sentences > are similar to everyday conversations. > > Op requires the same type of stuff . If colloquial words are there that will be an added advantage for building Speech Systems for Tamil language. Note that there is NO FOSS corpus for Tamil Language or any Indian Language -- ********************************** JAGANADH G http://jaganadhg.in *ILUGCBE* http://ilugcbe.org.in _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc
