This email is simply to record a (to my knowledge) previously undocumented
aspect of how the Moses training scripts interact with giza++.


I've been looking through moses/scripts/training/train-model.perl and the
execution scripts created by EMS, and I ran across Perl function called
make_classes, which (not surprisingly) calls mkcls. This didn't surprise
me, as I assumed that giza++ used the resulting classes. But in examining
the subsequent calls to giza++ (or mgiza), I couldn't see anywhere else in
the Moses training pipeline that actually uses the *.vcb.classes files
resulting from the calls to mkcls.

Now, there are certainly use cases where a research might want to
explicitly make use of these classes (a class LM, for example). But mkcls
is called by default whenever training Moses using train-model.perl, and in
the general case, I couldn't find any place where these classes are
subsequently used. So I wondered: Am I missing something obvious? Are the
results of mkcls actually used anywhere by default in the Moses training
pipeline?

After running mgiza --help, it appears that mgiza can accept these class
files, but it appears that train-model.perl is not actually explicitly
providing these class files to mgiza. So, I tried running mgiza as it was
called by train-model.perl in a clean directory, providing it only the
files that mgiza actually was provided via command flags (the src-tgt.cooc,
tgt.vcb, and src.vcb files). Run this way, mgiza complains:

ERROR: can not read src.vcb.classes
ERROR: can not read tgt.vcb.classes

So, the answer is that mgiza does actually need these files, but
train-model.perl does not explicitly provide them to mgiza, instead relying
on the fact that mgiza defaults to assuming that the class files exist in
the same location as the vcb files with the same prefix, but the additional
suffix .classes

Thanks,
Lane
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to