Hi Taylor,

 The moses-support team does not supported DoMY. I'll answer your 
 specific DoMY questions separately, but I can share some general 
 thoughts here.

 Totally automating the end-to-end process from .tmx files to a trained 
 translation model is a challenging task. It's often necessary to insert 
 break points where localization engineers and linguists can review the 
 extracted data and identify any inherited corruption from the .tmx data.

 In general, DoMY steps to go from .tmx to trained engine would include 
 running the following "graphs" in the order below. Note: the term 
 "graph" refers to a parallel toolchain or pipeline that's synchronized 
 (alignes) data between two or more languages. It comes from the 
 multi-media term "filter graph", such as Linux's GStreamer & Microsoft's 
 DirectShow that work on parallel synchronized media streams.

 1) domy import-tmx - extracts tmx data to parallel corpora files 
 (Python)
 2) domy clean-corpus - cleans parallel data similar to Moses' 
 clean-corpus-n.perl. Adds extraction of language model data (Python)
 3) domy build-lm - consolidates individual corpus files to master 
 language model and recaser corpus files (Python)
 4) domy build-tm - consolidates individual corpus files to two master 
 parallel files plus supporting dev/eval sets and .sgm files (Python)
 5) train - wrapper for the following sequential steps (Bash scripts)
    a) train-lm - trains language model from corpus in (3)
    b) train-tables - trains phrase and reorder tables from corpus in 
 (4)
    c) train-tablesbin - binarizes tables from (5b)
    d) train-recaser - trains recaser model from corpus in (3)
    e) train-mert - tunes a translation model consisting of LM from (5a) 
 and tables from (5c)
    f) train-eval - translates runs mteval-v12.pl from eval sets in (4)
 6) domy translate - translates new documents using the engine created 
 above (Python)

 You need to edit/configure the various config.ini files (1-4) and also 
 issue a proper command line for (5). Renaming directories should not be 
 necessary if the config.ini's are set up properly.

 If you need help, I'll be happy to take that offline from 
 moses-support.

 Tom


 On Mon, 12 Sep 2011 10:30:30 -0400, Taylor Rose 
 <[email protected]> wrote:
> Hey all,
>
> I've been working with Domy for about a week and I'm trying to 
> automate
> the process of going from a *.tmx to a trained translation module.
>
> This is my understanding of the sequence so far:
> import-tmx
> rename directories (ie. en/en/data.txt en/nl/data.txt)
> clean-corpus
> sa-champollion to align
> build-tm
> build-lm
> train-lm
> ready to translate?
>
> Is my understanding of this correct? I'd also appreciate help with
> formatting output of graphs. the import-tmx graph outputs a directory
> structure such as '/Test/tm/us_gb/us_gb/nl_nl' but the clean-corpus
> graph expects a structure such as '/Test/tm/en/en/nl'. Is there a way 
> to
> modify the output in the config.ini file or should I just write a 
> bash
> script to rename everything?
>
> Thanks,
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to