I thought this prospective source may be of interest to some of the list members in your various experiments.
Please ignore this message if and accept my apologies if this message does not interest you. In my random browsing and searching for multilingual texts for training I stumbled across the following: http://www.watchtower.org/languages.htm The good thing about the source is that a portion of the content is dynamic. And so, just like the europarl corpus, the potential multilingual corpus we could harvest grows monthly. Some of the languages have more support than others but I suppose that's life. I'm thinking of developing a perl script to scrape and paragraph/sentence align this stuff for training our systems. Is this something that any of you guys would be interested in using and or participating in? If so, please drop me an off-list mail. Even if it is just to express an interest. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
