Re: [Moses-support] Preparing training data

Achim Ruopp Sun, 11 May 2014 17:28:22 -0700

There are several tools in M4Loc to process TMX files (e.g. convert them to
parallel corpus files):


https://code.google.com/p/m4loc/wiki/TMXTools 

The Perl scripts are platform independent. Adobe provided the TMX to Moses
Corpus Tooltool, so I'm not too familiar with it (indeed the install
packages are available only for the Mac - the only platform where both the
UI package Adobe AIR and Moses Unix-scripts are fully supported).

 

There is also https://code.google.com/p/extract-tmx-corpus/ 

 

I think there is at least one more tool to convert TMX into a parallel
corpus, but the name escapes me right now.

 

Achim 

 

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Hieu Hoang
Sent: Saturday, May 10, 2014 5:11 PM
To: Ricardo Cabello Sánchez; [email protected]
Subject: Re: [Moses-support] Preparing training data

 

the Moses4Localization project might have tools you need to deal with TMX. 
  https://code.google.com/p/m4loc/
Moses itself only accept raw text file

On 05/05/2014 11:58, Ricardo Cabello Sánchez wrote:

Dear all, 

 

I would like to ask you how to proceed to prepare data to train the system.

The thing is that I have a lot of translation data in xml files: one with
original text and one with translation text and I would like to know how to
prepare them. I would need to convert files into text files without tags? Is
there some way to use a tmx file to train system? 

 

Thank you and best regards, 

 

Ricardo






_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Preparing training data

Reply via email to