Thanks to both John and Amittai for pointing me in the right direction. The domain that i am working is movies.
For this reason i am having trouble getting movie scripts in the source language. Initially when i started the project i was wondering if there is a way to extract raw movie audio and do some speech recognition but obviously that in itself is a project. Its easy to get the translations of Hindi to English from subtitles. I guess now that google has added Hindi to its list i could take the translations and get it in english to prepare parallel corpus but this will take a long time. I will have to take one movie at a time. The good thing is that one hindi movie is about 2-3 hours. I guess by considering only a few movies i will have heaps of parallel tect to work with. Again thanks a lot. Regards Vineet _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
