Re: [Moses-support] Preparing TMX files for use in Moses

Tom Hoar Sun, 13 Mar 2016 04:06:58 -0700

I don't know the tmx2txt.pl script, but I can suggest where to look forproblems.

The most frequent problem we have when extracting data from TMX filescomes from files that don't comply with the TMX specification,especially regarding compliance with the srclang attributes. The specstates this about how to identify the source language:


   "/the <tuv> holding the source segment will have its xml:lang
   attribute set to the same value as srclang. (except if srclang is
   set to "*all*"). If a <tu> element does not have a srclang attribute
   specified, it uses the one defined in the <header> element./"

Sadly, many TMX creation tools, including tools from SDL, do notproperly identify the source language. Each tool that looks for thesource language TUV according to the spec handles erroneous TMX segmentsin its own way. So, you need to learn how your TMX declares the srclangattribute, and then study the script to see where there's a mismatch.

You can see how we managed these sloppy TMX files in this post, only aweek old: https://pttools.freshdesk.com/discussions/topics/6000034251


Hope this helps.

Tom


On 3/12/2016 8:57 PM, [email protected] wrote:

Date: Sat, 12 Mar 2016 13:42:05 +0100
From: Sa?o Kuntaric<[email protected]>
Subject: [Moses-support] Preparing TMX files for use in Moses
To:[email protected]

Hi all,

I have a question that is not connected directly to Moses. I am trying to
prepare the corpora for training my engine. I have exported a few of my TMs
to the TMX format and now I am trying to create two separate UTF-8 text
files. I have tried it with the extract-tmx-corpus and tmx2txt.pl tools. I
get empty text files for both (the former tool claims that the input file
can't be read). Are there any special setting I need to set when extracting
the TMX files? I am using SDL Trados Studio 2015 for exporting the files.

Has anyone come across anything like this?

-- lp, Sa?o

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Preparing TMX files for use in Moses

Reply via email to