I see... Thanks, Tom :)

El 14 de julio de 2011 17:23, Tom Hoar <[email protected]
> escribió:

> Hi Rubén,
>
> Are you talking about MMM's Windows TMX extractor? If so, check your TMX
> file(s) to make sure they're not corrupted in some way, like not having
> equal TUV's within each TU.
>
> "ready-to-use" is always a subjective measure. TMX extractors have various
> degrees of support for the TMX format. Plus, the extractor can't improve on
> data that's inappropriate for SMT. TM's that come from translation memory
> systems contain a lot of data that's not relevant to SMT training.
>
> Tom
>
>
>
> On Thu, 14 Jul 2011 16:39:41 +0200, Rubén de la Fuente <
> [email protected]> wrote:
>
> Dear all,
>  I am playing around with Moses for mere mortals (0.991). I have a very
> large EN>ES TM. When I try to extract the corpus with the Windows add-in, it
> turns out EN file has 255 559 lines as opposed to 262 858 in the ES one.
> This will cause problems in the training phase. Any idea how to tackle it? I
> thought the Windows add-in would provide ready-to-use corpora... Thanks in
> advance for your help. Cheers,
>
>
> --
> Rubén de la Fuente
> EN/FR>ES translator
> [email protected]
> +34 686 33 59 97
> skype: rudelafuente
> twitter: rubendelafuente <http://twitter.com/rubendelafuente>
> <http://twitter.com/rubendelafuente>
> http://es.linkedin.com/in/rubendelafuente
> www.wordbonds.es
> Wordbonds in Facebook<http://www.facebook.com/#!/pages/wordbonds/127510957570>
>
>


-- 
Rubén de la Fuente
EN/FR>ES translator
[email protected]
+34 686 33 59 97
skype: rudelafuente
twitter: rubendelafuente <http://twitter.com/rubendelafuente>
<http://twitter.com/rubendelafuente>
http://es.linkedin.com/in/rubendelafuente
www.wordbonds.es
Wordbonds in Facebook<http://www.facebook.com/#!/pages/wordbonds/127510957570>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to