Hi Sandipan, first, please post Moses-related questions to [email protected], not individual contributors.
second, the current seven features used by Mmsapt / PhraseDictionaryBitextSampling are (for details, see my recent paper on this phrase table implementation: https://www.researchgate.net/publication/267270863_Dynamic_Phrase_Tables_for_Machine_Translation_in_an_Interactive_Post-editing_Scenario ) THE STANDARD SET OF FEATURES MAY CHANGE AT ANY TIME, as this is still work in progress. - forward and backward lexically smoothed phrase scores (2 scores; same as standard features) - rarity penalty (1/(x+1)), where x is the number of phrase pair occurrences in the corpus/sample (1 score) - the lower bound on forward and backward phrase-level probabilities, with confidence level .99 (2 scores) - 2 provenance features (x/(x+1)), where x is the number of phrase pair occurrences in the (static) background and (dynamic) foreground corpus (2 scores) third, you need to retrain the feature weights for good performance with any of the standard techniques, but with the I usually use MERT. The executable simulate-pe allows you to feed in references and word aligmnents one sentence at a time; there are additional parameters --spe-src, --spe-trg, --spe-aln to specify source, target, and alignment (symal output format). Source and target files are one sentence per line, tokenized. Michael Denkowski is currently in the process of integrating online tuning into Moses, but I'm not sure whether that's ready to be deployed yet. Regards - Uli On Thu, Oct 23, 2014 at 1:47 AM, Sandipan Dandapat < [email protected]> wrote: > Dear Ulrich, > I got your reference from Prashanta Mathur. I am a postdoctoral researcher > in CNGL, DCU and I am working with Moses incremental retraining. It will > be great if you help me to understand couple of doubts: > > 1. I found there are 7 weights to define for PT0 (PT0 is the Mmsapt name) > i.e. > > Mmsapt name=PT0 output-factor=0 num-features=7 > base=/home/sandipan/inc_retrain/MT_sys/En-Fr/dgt/50_i/mmsa_pt/train. L1=en > L2=fr > [weight] > PT0= 0.1 0.2 0.3 0.4 0.5 0.6 0.7 > > num-featues in PBSMT model is 4 which does not work with Mmsapt. What are > these 7 weights? Can I use uniform weights for all 7 features? Or how do I > adjust these values? Or, how to adjust these weights? > > 2. I found there is significant difference in BLEU score when I am using > standard PBSMT model and when I am using MMST based model. Is this because > of the weights I am using or am I doing something wrong? > > It will be real great help, if you help me to understand the above issue. > Thanking you. > > Regards, > sandipan > > > -- Ulrich Germann Research Associate School of Informatics University of Edinburgh
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
