alex bartoli munoz <traviesusmaximu...@...> writes:

> 
> 
> Hello, We are trying to compile and train moses to translate a huge amount of
documents. We follow the steps described in   
 http://www.statmt.org/moses_steps.html  ( Moses Installation and Training
Run-Through ) but we have change the corpus and use the corpus available in
europral for a couple of languages. I would highly appreciate if you could
answer some of the questions we have: 1.  Is it possible to achieve something
similar to the online demo with a 4-core machine (6gb RAM) ?2.  Is it necessary
to train with the full europarl corpus? 3.  We plan to translate big amounts of
text... How fast moses goes for big amounts of text?4.  Does anybody have
trained files so we can achieve a good quality without having to retrain the
whole corpus? Some repositories, private, anything would be of great help.  5. 
The documentation explains that we need to do 4 preprocess steps for europarl
corpus:      tokenizer, lowercase, take xml takes off and strip empty lines.   
   I have taken the xml tags off and stripped the empty lines with an script
done for me, because I haven't found any script in moses.       Are these
scripts available somewhere?     Could you please help us by answering these
questions? Any help will be very much appreciated. Actualízate gratis al nuevo
Internet Explorer 8 y navega más seguro
> 
> 
> _______________________________________________
> Moses-support mailing list
> moses-supp...@...
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 

Hi, Alex,

I've announced today in this mailing list a product (Moses for Mere Mortals) 
that will help you dealing with Moses. URL:
http://code.google.com/p/moses-for-mere-mortals/downloads/list?can=2&q=&sort=&colspec=Filename+Summary+Uploaded+Size+DownloadCount.

It is more turned to personal translation memory files, but it can also deal
with the Europarl corpus. A machine like yours should be able to deal with, say,
a 6 million segments. More important, however, than the number of segments are
the domains that you want to cover. To get good results, they should be well
represented in the trained corpus. That's why, if you use your own (or your own
group) TMX files to create a corpus you willbe probably better off. I have
trained files, but unhappily I cannot give them to you, since some of them are
confidential.  Moses for Mere Mortals automates all the training process and you
do not need to give separate instructions in order to do a whole training. It
only covers non-factored training. But we are getting BLEU scores of 60 and
above (depending on the language pairs). I know I could be partial, but our
users state that the results are good. Give it a try. It has a small demo corpus
that will show you what you can get.

João


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to