Thanks to both John and Amittai for pointing me in the right direction.

The domain that i am working is movies.

For this reason i am having trouble getting movie scripts
in the source language. Initially when i started the project
i was wondering if there is a way to extract raw movie audio
and do some speech recognition but obviously that in itself
is a project.

Its easy to get the translations of Hindi to English from subtitles.

I guess now that google has added Hindi to its list
i could take the translations and get it in english to
prepare parallel corpus but this will take a long time.

I will have to take one movie at a time. The good thing is that
one hindi movie is about 2-3 hours.

I guess by considering only a few movies i will have heaps of parallel
tect to work with. 

Again thanks a lot.

Regards

Vineet


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to