alex bartoli munoz <traviesusmaximu...@...> writes: > > > Hello, We are trying to compile and train moses to translate a huge amount of documents. We follow the steps described in http://www.statmt.org/moses_steps.html ( Moses Installation and Training Run-Through ) but we have change the corpus and use the corpus available in europral for a couple of languages. I would highly appreciate if you could answer some of the questions we have: 1. Is it possible to achieve something similar to the online demo with a 4-core machine (6gb RAM) ?2. Is it necessary to train with the full europarl corpus? 3. We plan to translate big amounts of text... How fast moses goes for big amounts of text?4. Does anybody have trained files so we can achieve a good quality without having to retrain the whole corpus? Some repositories, private, anything would be of great help. 5. The documentation explains that we need to do 4 preprocess steps for europarl corpus: tokenizer, lowercase, take xml takes off and strip empty lines. I have taken the xml tags off and stripped the empty lines with an script done for me, because I haven't found any script in moses. Are these scripts available somewhere? Could you please help us by answering these questions? Any help will be very much appreciated. Actualízate gratis al nuevo Internet Explorer 8 y navega más seguro > > > _______________________________________________ > Moses-support mailing list > moses-supp...@... > http://mailman.mit.edu/mailman/listinfo/moses-support >
Hi, Alex, I've announced today in this mailing list a product (Moses for Mere Mortals) that will help you dealing with Moses. URL: http://code.google.com/p/moses-for-mere-mortals/downloads/list?can=2&q=&sort=&colspec=Filename+Summary+Uploaded+Size+DownloadCount. It is more turned to personal translation memory files, but it can also deal with the Europarl corpus. A machine like yours should be able to deal with, say, a 6 million segments. More important, however, than the number of segments are the domains that you want to cover. To get good results, they should be well represented in the trained corpus. That's why, if you use your own (or your own group) TMX files to create a corpus you willbe probably better off. I have trained files, but unhappily I cannot give them to you, since some of them are confidential. Moses for Mere Mortals automates all the training process and you do not need to give separate instructions in order to do a whole training. It only covers non-factored training. But we are getting BLEU scores of 60 and above (depending on the language pairs). I know I could be partial, but our users state that the results are good. Give it a try. It has a small demo corpus that will show you what you can get. João _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
