Re: [Moses-support] output-search-graph-extended (osgx) format: A patch you might want and a question.

Hieu Hoang Thu, 03 Feb 2011 18:03:52 -0800

 hi dennis

for consistency & to be compatible with pharoah, the LM scores areconverted from log 10 to natural log (multiply by 2.3)

   -2.085553 --> -4.802
   -2.009406 --> -4.627


On 03/02/2011 12:04, Dennis Mehay wrote:

Hello all,
I'm interested in using the extended output search graph (osgx) outputfrom Moses.
First, I have a patch you might be interested in. When I printed outa few toy examples, I noticed that there was no mention of of theinput coverage of the output (as there *is* in the osg format), so Imade a little patch that fixes that.
Here's the diff:
--- mosesdecoder/trunk/moses/src/Manager.cpp 2011-01-1822:43:58.000000000 -0500
+++ Manager.cpp 2011-01-18 22:59:11.000000000 -0500
@@ -568,6 +568,10 @@
StaticData::Instance().GetScoreIndexManager().PrintLabeledScores(outputSearchGraphStream, scoreBreakdown );
        outputSearchGraphStream << " ]";
+ // added this so that we will have the span in the inputcovered (why wasn't this in the extended format?)
+       // (DNM, 19 Nov 2010)
+ outputSearchGraphStream << " covered=" <<searchNode.hypo->GetCurrSourceWordsRange().GetStartPos()
+ << "-" << searchNode.hypo->GetCurrSourceWordsRange().GetEndPos();
outputSearchGraphStream << " out=" <<searchNode.hypo->GetCurrTargetPhrase().GetStringRep(outputFactorOrder)<< endl;
 }
That seems to do it. You can of course omit my snide remarks and myinitials from the patch, should you choose to use it.
Also, I had a question. When toying around with the (patched) osgxoutput, I see that, ostensibly, all of the model component scores arementioned. I wonder exactly what is being scored, though. First off,are these scores (when appropriate, e.g., the lm scores) based on whatcame "before" -- i.e., on the content of the nodes that these nodespoint back to? Whether they are or not, I get strange results on atoy example I cooked up.
Using the 197 sentence pairs in the europarl de-en corpus that meetthe standard 80 word max cutoff (with aggressive tokenization of theGerman, but not of the English), I trained up a little model.Translating the sentence "das ist nicht schlecht ." (a silly sentencethat I could, with my limited German, compose using the limitedresources of the toy phrase table), gives an osgx file with thefollowing entries in it (among others):
...
0 hyp=1 back=0 [ d: 0.000 w: -1.000 u: 0.000 d: -0.511 0.000 0.0000.000 0.000 0.000 lm: -4.802 -100.000 tm: -2.398 0.000 -5.011 0.0001.000 ] covered=0-0 out=that0 hyp=6 back=0 [ d: 0.000 w: -1.000 u: 0.000 d: -1.609 0.000 0.0000.000 0.000 0.000 lm: -4.627 -100.000 tm: -1.099 -5.088 -5.011 0.0001.000 ] covered=0-0 out=this
...
So far, so good. These two hypotheses translate the span 0-0 (i.e.,"das"), and they are at the beginning of the English output sentence(back=0, i.e., they point back to the initial, empty hypothesis). So,presumably, the first lm score (a word-based lm) should be a scoreover either "<s> that" (resp, "<s> this"), if this is a score based onthe prior hypothesis that it points back to, or "that" (resp, "this"),if not.
But looking in the toy lm file, we see that:

-2.001529       that    -0.3822374
...
-2.162679       this    -0.3372842
...
-2.085553 <s> that        -0.1508171
...
-2.009406 <s> this        -0.01284565
none of which gibes with what we see for the first of the two lmcomponent scores in the osgx file.
Does anyone know the gory details of the osg(x) file output enough toadvise?
Best,
D.N. ("Dennis")


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] output-search-graph-extended (osgx) format: A patch you might want and a question.

Reply via email to