Hi Selva, Moses is language-independent, so you can use it for any language pair as long as you have a parallel corpus. That said, you may have to do language specific pre and post-processing. For instance,
- Moses does not have tokenizers for Tamil. - Tamil is agglutinative, so you may want to segment the words to reduce data sparsity as an additional pre-processing step. You can the Indic NLP library fo some some simple tokenization as well as segment the text ( http://anoopkunchukuttan.github.io/indic_nlp_library ). Since English and Tamil have different word order, you should try syntax based models (which is implemented in the Moses package). Another option way is to pre-order the English sentence to the Tamil word order before training a phrase based system. You can use this for pre-ordering: http://www.cfilt.iitb.ac.in/~moses/download/cfilt_preorder/register.html Regards, Anoop. On Thu, Jul 14, 2016 at 11:43 AM, Selva Nalladurai <selva...@gmail.com> wrote: > Hello Team, > I am Selva Nalladurai, doing my ME(masters of > engineering) CSE and am from India. I m doing my research in SMT and would > like to know whether Moses can be used for translating English to Tamil ( a > regional language spoken in our country), can i follow the same steps as > other languages translation for upolading my corpus. > Thankyou. > > Regards, > Selva Nalladurai > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- I claim to be a simple individual liable to err like any other fellow mortal. I own, however, that I have humility enough to confess my errors and to retrace my steps. http://flightsofthought.blogspot.com
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support