Hi Rubén, 

Are you talking about MMM's Windows TMX extractor? If so,
check your TMX file(s) to make sure they're not corrupted in some way,
like not having equal TUV's within each TU. 

"ready-to-use" is always a
subjective measure. TMX extractors have various degrees of support for
the TMX format. Plus, the extractor can't improve on data that's
inappropriate for SMT. TM's that come from translation memory systems
contain a lot of data that's not relevant to SMT training. 

Tom 

On
Thu, 14 Jul 2011 16:39:41 +0200, Rubén de la Fuente  wrote:  

Dear all,


I am playing around with Moses for mere mortals (0.991). I have a very
large EN>ES TM. When I try to extract the corpus with the Windows
add-in, it turns out EN file has 255 559 lines as opposed to 262 858 in
the ES one. This will cause problems in the training phase. Any idea how
to tackle it? I thought the Windows add-in would provide ready-to-use
corpora... Thanks in advance for your help. Cheers,

-- 
Rubén de la
Fuente
EN/FR>ES translator
[email protected] [1]
+34 686 33 59
97
skype: rudelafuente
twitter: rubendelafuente [2] 

[3]http://es.linkedin.com/in/rubendelafuente [4]
www.wordbonds.es [5]

Wordbonds in Facebook [6]   

Links:
------
[1]
mailto:[email protected]
[2]
http://twitter.com/rubendelafuente
[3]
http://twitter.com/rubendelafuente
[4]
http://es.linkedin.com/in/rubendelafuente
[5]
http://www.wordbonds.es
[6]
http://www.facebook.com/#!/pages/wordbonds/127510957570
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to