This email is simply to record a (to my knowledge) previously undocumented aspect of how the Moses training scripts interact with giza++.
I've been looking through moses/scripts/training/train-model.perl and the execution scripts created by EMS, and I ran across Perl function called make_classes, which (not surprisingly) calls mkcls. This didn't surprise me, as I assumed that giza++ used the resulting classes. But in examining the subsequent calls to giza++ (or mgiza), I couldn't see anywhere else in the Moses training pipeline that actually uses the *.vcb.classes files resulting from the calls to mkcls. Now, there are certainly use cases where a research might want to explicitly make use of these classes (a class LM, for example). But mkcls is called by default whenever training Moses using train-model.perl, and in the general case, I couldn't find any place where these classes are subsequently used. So I wondered: Am I missing something obvious? Are the results of mkcls actually used anywhere by default in the Moses training pipeline? After running mgiza --help, it appears that mgiza can accept these class files, but it appears that train-model.perl is not actually explicitly providing these class files to mgiza. So, I tried running mgiza as it was called by train-model.perl in a clean directory, providing it only the files that mgiza actually was provided via command flags (the src-tgt.cooc, tgt.vcb, and src.vcb files). Run this way, mgiza complains: ERROR: can not read src.vcb.classes ERROR: can not read tgt.vcb.classes So, the answer is that mgiza does actually need these files, but train-model.perl does not explicitly provide them to mgiza, instead relying on the fact that mgiza defaults to assuming that the class files exist in the same location as the vcb files with the same prefix, but the additional suffix .classes Thanks, Lane
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
