I thought this prospective source may be of interest to some of the list members
in your various experiments.

Please ignore this message if and accept my apologies if this message does not
interest you.

In my random browsing and searching for multilingual texts for training I
stumbled across the following:

http://www.watchtower.org/languages.htm

The good thing about the source is that a portion of the content is dynamic. And
so, just like the europarl corpus, the potential multilingual corpus we could
harvest grows monthly.

Some of the languages have more support than others but I suppose that's life.
I'm thinking of developing a perl script to scrape and paragraph/sentence align
this stuff for training our systems. Is this something that any of you guys
would be interested in using and or participating in?

If so, please drop me an off-list mail. Even if it is just to express an
interest.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to