Hi Alex > > I would highly appreciate if you could answer some of the questions we > have: 1. Is it possible to achieve something similar to the online demo > with a 4-core machine (6gb RAM) ?
The machine used to host the moses online demo (four language pairs) is 8-core, with 48G of ram. However we only use about 1.5G of ram for each language pair. > > 2. Is it necessary to train with the full europarl corpus? > In general more data gives better results. The online demo is trained on the whole of europarl. > 3. We plan to translate big amounts of text... How fast moses goes for big > amounts of text? Well that really depends on your model, your hardware and your data. Some figures for one particular setup are here http://www.mtmarathon2010.info/web/Program_files/art-haddow.pdf Around one second per sentence is a good ball-park estimate. > > 4. Does anybody have trained files so we can achieve a good quality > without having to retrain the whole corpus? Some repositories, private, > anything would be of great help. Training is fairly straightforward if you use the scripts provided. I'm not personally aware of any trained models for download - but then I've never looked for them. > > > 5. The documentation explains that we need to do 4 preprocess steps for > europarl corpus: tokenizer, lowercase, take xml takes off and strip empty > lines. I have taken the xml tags off and stripped the empty lines with an > script done for me, because I haven't found any script in moses. Are these > scripts available somewhere? I think scripts/training/clean-corpus-n.perl does what you want, Hope that helps - regards - Barry -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
