Dear all, I hope that this is not too stupid a question, and that it hasn't been asked recently.
In the MOSES EMS, when running experiments the phrase table is automatically reduced to only those phrases that actually occur in the respective dev/test set. Obviously this saves a lot of memory without changing the resulting translations. Now, I was wondering if something similar can be done/is done with the language model. That is, can one reduce the ARPA-file to only those words that occur on the target side in the (filtered) phrase table? The objective would of course be to maintain the translation result. Would the LM-software renormalize internally if some of the original entries are removed? Then the results would differ. This may even depend on what language model you use to load (rather than train) the ARPA file. I am using SRILM in my own translation programs, but would also be interested in other toolkits in case they behave more suitably. Can anyone point me to anything? Many thanks! Thomas Schoenemann (currently University of Pisa)
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
