There are several tools in M4Loc to process TMX files (e.g. convert them to parallel corpus files):
https://code.google.com/p/m4loc/wiki/TMXTools The Perl scripts are platform independent. Adobe provided the TMX to Moses Corpus Tooltool, so I'm not too familiar with it (indeed the install packages are available only for the Mac - the only platform where both the UI package Adobe AIR and Moses Unix-scripts are fully supported). There is also https://code.google.com/p/extract-tmx-corpus/ I think there is at least one more tool to convert TMX into a parallel corpus, but the name escapes me right now. Achim From: [email protected] [mailto:[email protected]] On Behalf Of Hieu Hoang Sent: Saturday, May 10, 2014 5:11 PM To: Ricardo Cabello Sánchez; [email protected] Subject: Re: [Moses-support] Preparing training data the Moses4Localization project might have tools you need to deal with TMX. https://code.google.com/p/m4loc/ Moses itself only accept raw text file On 05/05/2014 11:58, Ricardo Cabello Sánchez wrote: Dear all, I would like to ask you how to proceed to prepare data to train the system. The thing is that I have a lot of translation data in xml files: one with original text and one with translation text and I would like to know how to prepare them. I would need to convert files into text files without tags? Is there some way to use a tmx file to train system? Thank you and best regards, Ricardo _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
