Dear all

I would like to apply a word alignment model trained with mgiza. First, I
would like to give mgiza all of the training data for model estimation and
later word-align data line by line. The lines come from the same corpus and
do not require re-estimation e.g. because of OOVs.

This seems to be difficult with mgiza, but looking at this example:

https://github.com/moses-smt/mgiza/blob/master/mgizapp/scripts/force-align-moses.sh

I figured out that running mgiza in this fashion:

*mgiza [config file] -m1 0 -m2 0 -m3 0 -mh 0 -m4 1 -restart 11 [all
"previous" options]*

seems to reuse all existing model parameters. "Seems to" because the help
text of the "restart" option is a bit unclear, it says: "Restart training
from a level ... 11: Model 4 and on". *Does anyone know what "Model 4 and
on" means?* Also, "-m4 1" seems to indicate that one iteration of model 4
training is performed, but why?

When I use "-m4 0" (because I do not want to retrain anything), this
results in a rather puzzling error message:

"You specified to load model 4 and train model 4 (restart == 10)"

I explicitly wrote "-restart 11" in the command, can anyone explain why
mgiza thinks I have specified "-restart 10"?

The first command above succeeds, but simply produces word alignments for
the whole training corpus again. What I would like to do is word-align a
single pair of sentences in the source and target language.

I thought of using the parameter "tc =   (test corpus file name)" - *does
anyone know what format this file must be in?*

Thanks a lot for your help.
Mathias
—

Mathias Müller
AND-2-20
Institute of Computational Linguistics
University of Zurich
Switzerland
+41 44 635 75 81
[email protected]
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to