Hi Sandipan,

first, please post Moses-related questions to [email protected], not
individual contributors.

second, the current seven features used by Mmsapt /
PhraseDictionaryBitextSampling are (for details, see my recent paper on
this phrase table implementation:
https://www.researchgate.net/publication/267270863_Dynamic_Phrase_Tables_for_Machine_Translation_in_an_Interactive_Post-editing_Scenario
)

THE STANDARD SET OF FEATURES MAY CHANGE AT ANY TIME, as this is still work
in progress.

- forward and backward lexically smoothed phrase scores (2 scores; same as
standard features)
- rarity penalty (1/(x+1)), where x is the number of phrase pair
occurrences in the corpus/sample (1 score)
- the lower bound on forward and backward phrase-level probabilities, with
confidence level .99 (2 scores)
- 2 provenance features (x/(x+1)), where x is the number of phrase pair
occurrences in the (static) background and (dynamic) foreground corpus (2
scores)

third, you need to retrain the feature weights for good performance with
any of the standard techniques, but with the  I usually use MERT. The
executable simulate-pe allows you to feed in references and  word
aligmnents one sentence at a time; there are additional parameters
--spe-src, --spe-trg, --spe-aln to specify source, target, and alignment
(symal output format). Source and target files are one sentence per line,
tokenized. Michael Denkowski is currently in the process of integrating
online tuning into Moses, but I'm not sure whether that's ready to be
deployed yet.

Regards - Uli



On Thu, Oct 23, 2014 at 1:47 AM, Sandipan Dandapat <
[email protected]> wrote:

> Dear Ulrich,
> I got your reference from Prashanta Mathur. I am a postdoctoral researcher
> in CNGL, DCU and  I am working with Moses incremental retraining. It will
> be great if you help me to understand couple of doubts:
>
> 1. I found there are 7 weights to define for PT0 (PT0 is the Mmsapt name)
> i.e.
>
> Mmsapt name=PT0 output-factor=0 num-features=7
> base=/home/sandipan/inc_retrain/MT_sys/En-Fr/dgt/50_i/mmsa_pt/train. L1=en
> L2=fr
> [weight]
> PT0= 0.1 0.2 0.3 0.4 0.5 0.6 0.7
>
> num-featues in PBSMT model is 4 which does not work with Mmsapt. What are
> these 7 weights? Can I use uniform weights for all 7 features? Or how do I
> adjust these values? Or, how to adjust these weights?
>
> 2. I found there is significant difference in BLEU score when I am using
> standard PBSMT model and when I am using MMST based model. Is this because
> of the weights I am using or am I doing something wrong?
>
> It will be real great help, if you help me to understand the above issue.
> Thanking you.
>
> Regards,
> sandipan
>
>
>


-- 
Ulrich Germann
Research Associate
School of Informatics
University of Edinburgh
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to