Hi Alex

>
> I would highly appreciate if you could answer some of the questions we
> have: 1.  Is it possible to achieve something similar to the online demo
> with a 4-core machine (6gb RAM) ?

The machine used to host the moses online demo  (four language pairs) is 
8-core, with 48G of ram. However we only use about 1.5G of ram for each 
language pair. 

>
> 2.  Is it necessary to train with the full europarl corpus?
>

In general more data gives better results. The online demo is trained on the 
whole of europarl.

> 3.  We plan to translate big amounts of text... How fast moses goes for big
> amounts of text?

Well that really depends on your model,  your hardware and your data. Some 
figures for one particular setup are here 
http://www.mtmarathon2010.info/web/Program_files/art-haddow.pdf
Around one second per sentence is a good ball-park estimate.

>
> 4.  Does anybody have trained files so we can achieve a good quality
> without having to retrain the whole corpus? Some repositories, private,
> anything would be of great help.

Training is fairly straightforward if you use the scripts provided. I'm not 
personally  aware of any trained models for download - but then I've never 
looked for them.

>
>
> 5.  The documentation explains that we need to do 4 preprocess steps for
> europarl corpus: tokenizer, lowercase, take xml takes off and strip empty
> lines. I have taken the xml tags off and stripped the empty lines with an
> script done for me, because I haven't found any script in moses. Are these
> scripts available somewhere?

I think scripts/training/clean-corpus-n.perl does what you want,

Hope that helps - regards - Barry

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to