Hi Kenneth, For my experiments, decoding compound nouns with moses is fine when a high probability within the phrase table is assigned to the correct phrase. Otherwise the process can be quit messy. Decoding nouns with moses_chart sometimes gets a bit more complicated, e.g. where the modifier and the head of a compound is swapped. Where necessary, I change the weighting of reordering to achieve better results. Compound verbs are even more difficult as they sometimes break up into two parts and take different positions in a sentence. I've had several ideas for further experiments:
1. adding special rules to the rule table of moses_chart 2. using a factored model 3. creating a second separate lm containing compound units solely 4. testing a hybrid SMT system, e.g. the one Gloves and Way describe 5. resolving compound noun contractions by regex before decoding, e.g. Schmid proposes a similar method for resolving verb contractions in English during tokenization 6. pre-editing 7. etc. Best, Daniel -----Ursprüngliche Nachricht----- Von: Kenneth Heafield [mailto:[email protected]] Gesendet: 27 April 2012 17:20 An: Daniel Schaut Cc: [email protected] Betreff: Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE Hi, Since this is EN-DE, how are you processing German compounds? Kenneth On 04/27/2012 07:43 AM, Daniel Schaut wrote: > Hi guys, > > Thank you for your comprehensive comments. > > The most likely thing is that you have some of your test set included > in your training set, > > Indeed, there exist some similarities owing to the domain (instruction > manuals). Typically for all kinds of manuals, you will find a high > degree of similarities, e.g. on sub-segment level. I extracted the > test set A and the tuning sets from the whole corpus before training > my engine to make sure that test set A doesnt interfere with the > training set. Hmmm thats an epic fail then Test set B was provided > at a much later stage, when the training process was already done. > > Did you try looking at the sentences ? -- 1,000 is few enough to > eyeball them. Have you tried the same system with a different corpus ? (e.g. > > EuroParl). Have you checked that your test set and your training set > do not intersect ? > > Apart from scoring, I checked almost every sentence in both test sets > for my thesis. The quality of the outputs is on a moderate level for > sentences up to 50 words; everything beyond is of lesser quality. > Especially, sentences up to 20 words are on a good level. > > Ive just prepared a third and fourth test set from the OpenOffice > corpus files and from another bunch of in-domain files. Regarding OO > files (2,000 sentences )BLEU is 0.0858 and METEOR is 0.3031. Kind of > disappointing The fourth test set of 2,000 sentences reveals similar > scores compared to the other in-domain test sets. > > Very short sentences will give you high scores. > > This might be truly another related issue for boosting the scores. On > average, almost half of the sentences in the test set A and B are quit > short. > > To conclude, one could say that Ive created an engine suitable for a > specific domain? However, the engines performance outside my domain > equals almost to zero? > > Best, > > Daniel > > *Von:*[email protected] [mailto:[email protected]] *Im Auftrag von > *Miles Osborne > *Gesendet:* 26 April 2012 21:17 > *An:* John D Burger > *Cc:* Daniel Schaut; [email protected] > *Betreff:* Re: [Moses-support] Higher BLEU/METEOR score than usual for > EN-DE > > Very short sentences will give you high scores. > > Also multiple references will boost them > > Miles > > On Apr 26, 2012 8:13 PM, "John D Burger" <[email protected] > <mailto:[email protected]>> wrote: > > I =think= I recall that pairwise BLEU scores for human translators are > usually around 0.50, so anything much better than that is indeed suspect. > > - JB > > On Apr 26, 2012, at 14:18 , Daniel Schaut wrote: > > > Hi all, > > > > > > Im running some experiments for my thesis and Ive been told by a > more experienced user that the achieved scores for BLEU/METEOR of my > MT engine were too good to be true. Since this is the very first MT > engine Ive ever made and I am not experienced with interpreting > scores, I really dont know how to reflect them. The first test set > achieves a BLEU score of 0.6508 (v13). METEORs final score is 0.7055 > (v1.3, exact, stem, paraphrase). A second test set indicated a > slightly lower BLEU score of 0.6267 and a METEOR score of 0.6748. > > > > > > Here are some basic facts about my system: > > > > Decoding direction: EN-DE > > > > Training corpus: 1.8 mil sentences > > > > Tuning runs: 5 > > > > Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain) > > > LM type: trigram > > TM type: unfactored > > > Im now > trying to figure out if these scores are realistic at all, as > different papers indicate by far lower BLEU scores, e.g. Koehn and > Hoang 2011. Any comments regarding the mentioned decoding direction > and related scores will be much appreciated. > > > > > > Best, > > > > Daniel > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] <mailto:[email protected]> > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > [email protected] <mailto:[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
