Hi Dennis
Thanks, I've checked in the path
best regards - Barry
On Thursday 03 Feb 2011 05:04:45 Dennis Mehay wrote:
> Hello all,
>
> I'm interested in using the extended output search graph (osgx) output from
> Moses.
>
> First, I have a patch you might be interested in. When I printed out a few
> toy examples, I noticed that there was no mention of of the input coverage
> of the output (as there *is* in the osg format), so I made a little patch
> that fixes that.
>
> Here's the diff:
>
> --- mosesdecoder/trunk/moses/src/Manager.cpp 2011-01-18
> 22:43:58.000000000 -0500
> +++ Manager.cpp 2011-01-18 22:59:11.000000000 -0500
> @@ -568,6 +568,10 @@
> StaticData::Instance().GetScoreIndexManager().PrintLabeledScores(
> outputSearchGraphStream, scoreBreakdown );
> outputSearchGraphStream << " ]";
>
> + // added this so that we will have the span in the input covered
> (why wasn't this in the extended format?)
> + // (DNM, 19 Nov 2010)
> + outputSearchGraphStream << " covered=" <<
> searchNode.hypo->GetCurrSourceWordsRange().GetStartPos()
> + << "-" <<
> searchNode.hypo->GetCurrSourceWordsRange().GetEndPos();
> outputSearchGraphStream << " out=" <<
> searchNode.hypo->GetCurrTargetPhrase().GetStringRep(outputFactorOrder) <<
> endl;
> }
>
> That seems to do it. You can of course omit my snide remarks and my
> initials from the patch, should you choose to use it.
>
> Also, I had a question. When toying around with the (patched) osgx output,
> I see that, ostensibly, all of the model component scores are mentioned.
> I wonder exactly what is being scored, though. First off, are these
> scores (when appropriate, e.g., the lm scores) based on what came "before"
> -- i.e., on the content of the nodes that these nodes point back to?
> Whether they are or not, I get strange results on a toy example I cooked
> up.
>
> Using the 197 sentence pairs in the europarl de-en corpus that meet the
> standard 80 word max cutoff (with aggressive tokenization of the German,
> but not of the English), I trained up a little model. Translating the
> sentence "das ist nicht schlecht ." (a silly sentence that I could, with
> my limited German, compose using the limited resources of the toy phrase
> table), gives an osgx file with the following entries in it (among
> others):
>
> ...
> 0 hyp=1 back=0 [ d: 0.000 w: -1.000 u: 0.000 d: -0.511 0.000 0.000 0.000
> 0.000 0.000 lm: -4.802 -100.000 tm: -2.398 0.000 -5.011 0.000 1.000 ]
> covered=0-0 out=that
> 0 hyp=6 back=0 [ d: 0.000 w: -1.000 u: 0.000 d: -1.609 0.000 0.000 0.000
> 0.000 0.000 lm: -4.627 -100.000 tm: -1.099 -5.088 -5.011 0.000 1.000 ]
> covered=0-0 out=this
> ...
>
> So far, so good. These two hypotheses translate the span 0-0 (i.e.,
> "das"), and they are at the beginning of the English output sentence
> (back=0, i.e., they point back to the initial, empty hypothesis). So,
> presumably, the first lm score (a word-based lm) should be a score over
> either "<s> that" (resp, "<s> this"), if this is a score based on the
> prior hypothesis that it points back to, or "that" (resp, "this"), if not.
>
> But looking in the toy lm file, we see that:
>
> -2.001529 that -0.3822374
> ...
> -2.162679 this -0.3372842
> ...
> -2.085553 <s> that -0.1508171
> ...
> -2.009406 <s> this -0.01284565
>
> none of which gibes with what we see for the first of the two lm component
> scores in the osgx file.
>
> Does anyone know the gory details of the osg(x) file output enough to
> advise?
>
> Best,
> D.N. ("Dennis")
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support