Hi Dennis

Thanks, I've checked in the path

best regards - Barry

On Thursday 03 Feb 2011 05:04:45 Dennis Mehay wrote:
> Hello all,
> 
> I'm interested in using the extended output search graph (osgx) output from
> Moses.
> 
> First, I have a patch you might be interested in.  When I printed out a few
> toy examples, I noticed that there was no mention of of the input coverage
> of the output (as there *is* in the osg format), so I made a little patch
> that fixes that.
> 
> Here's the diff:
> 
> --- mosesdecoder/trunk/moses/src/Manager.cpp    2011-01-18
> 22:43:58.000000000 -0500
> +++ Manager.cpp 2011-01-18 22:59:11.000000000 -0500
> @@ -568,6 +568,10 @@
>         StaticData::Instance().GetScoreIndexManager().PrintLabeledScores(
> outputSearchGraphStream, scoreBreakdown );
>         outputSearchGraphStream << " ]";
> 
> +       // added this so that we will have the span in the input covered
> (why wasn't this in the extended format?)
> +       // (DNM, 19 Nov 2010)
> +       outputSearchGraphStream << " covered=" <<
> searchNode.hypo->GetCurrSourceWordsRange().GetStartPos()
> +                               << "-" <<
> searchNode.hypo->GetCurrSourceWordsRange().GetEndPos();
>         outputSearchGraphStream << " out=" <<
> searchNode.hypo->GetCurrTargetPhrase().GetStringRep(outputFactorOrder) <<
> endl;
>  }
> 
> That seems to do it.  You can of course omit my snide remarks and my
> initials from the patch, should you choose to use it.
> 
> Also, I had a question. When toying around with the (patched) osgx output,
>  I see that, ostensibly, all of the model component scores are mentioned. 
>  I wonder exactly what is being scored, though.  First off, are these
>  scores (when appropriate, e.g., the lm scores) based on what came "before"
>  -- i.e., on the content of the nodes that these nodes point back to? 
>  Whether they are or not, I get strange results on a toy example I cooked
>  up.
> 
> Using the 197 sentence pairs in the europarl de-en corpus that meet the
> standard 80 word max cutoff (with aggressive tokenization of the German,
>  but not of the English), I trained up a little model.  Translating the
>  sentence "das ist nicht schlecht ." (a silly sentence that I could, with
>  my limited German, compose using the limited resources of the toy phrase
>  table), gives an osgx file with the following entries in it (among
>  others):
> 
> ...
> 0 hyp=1 back=0 [ d: 0.000 w: -1.000 u: 0.000 d: -0.511 0.000 0.000 0.000
> 0.000 0.000 lm: -4.802 -100.000 tm: -2.398 0.000 -5.011 0.000 1.000 ]
> covered=0-0 out=that
> 0 hyp=6 back=0 [ d: 0.000 w: -1.000 u: 0.000 d: -1.609 0.000 0.000 0.000
> 0.000 0.000 lm: -4.627 -100.000 tm: -1.099 -5.088 -5.011 0.000 1.000 ]
> covered=0-0 out=this
> ...
> 
> So far, so good.  These two hypotheses translate the span 0-0 (i.e.,
>  "das"), and they are at the beginning of the English output sentence
>  (back=0, i.e., they point back to the initial, empty hypothesis).  So,
>  presumably, the first lm score (a word-based lm) should be a score over
>  either "<s> that" (resp, "<s> this"), if this is a score based on the
>  prior hypothesis that it points back to, or "that" (resp, "this"), if not.
> 
> But looking in the toy lm file, we see that:
> 
> -2.001529       that    -0.3822374
> ...
> -2.162679       this    -0.3372842
> ...
> -2.085553       <s> that        -0.1508171
> ...
> -2.009406       <s> this        -0.01284565
> 
> none of which gibes with what we see for the first of the two lm component
> scores in the osgx file.
> 
> Does anyone know the gory details of the osg(x) file output enough to
> advise?
> 
> Best,
> D.N. ("Dennis")
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to