Dear all I would like to apply a word alignment model trained with mgiza. First, I would like to give mgiza all of the training data for model estimation and later word-align data line by line. The lines come from the same corpus and do not require re-estimation e.g. because of OOVs.
This seems to be difficult with mgiza, but looking at this example: https://github.com/moses-smt/mgiza/blob/master/mgizapp/scripts/force-align-moses.sh I figured out that running mgiza in this fashion: *mgiza [config file] -m1 0 -m2 0 -m3 0 -mh 0 -m4 1 -restart 11 [all "previous" options]* seems to reuse all existing model parameters. "Seems to" because the help text of the "restart" option is a bit unclear, it says: "Restart training from a level ... 11: Model 4 and on". *Does anyone know what "Model 4 and on" means?* Also, "-m4 1" seems to indicate that one iteration of model 4 training is performed, but why? When I use "-m4 0" (because I do not want to retrain anything), this results in a rather puzzling error message: "You specified to load model 4 and train model 4 (restart == 10)" I explicitly wrote "-restart 11" in the command, can anyone explain why mgiza thinks I have specified "-restart 10"? The first command above succeeds, but simply produces word alignments for the whole training corpus again. What I would like to do is word-align a single pair of sentences in the source and target language. I thought of using the parameter "tc = (test corpus file name)" - *does anyone know what format this file must be in?* Thanks a lot for your help. Mathias — Mathias Müller AND-2-20 Institute of Computational Linguistics University of Zurich Switzerland +41 44 635 75 81 [email protected]
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
