Re: [Moses-support] extract.cpp: bug in lexicalized reordering extraction code?

Nadi Tomeh Thu, 12 Apr 2012 09:51:03 -0700

 Hi Dennis,

 Actually this is no bug, you only misinterpreted the variables.

 The map inBottomRight stores the positions of the bottom right corners 
 of all extracted phrase pairs. Similarly, inBottomLeft stores the bottom 
 left corners of all extracted phrase pairs. (startF,startE) represents 
 the top left corner of the current phrase pair, while (endF,startE) 
 represents the top right corner.

 The orientation of the current phrase pair w.r.t. the previous phrase 
 pair, is defined by comparing the top corners of the current phrase pair 
 with the bottom corners of all the other extracted phrase pairs. 
 Therefore, for a monotone orientation, we check if there is a phrase 
 pair of which the bottom right corner falls in (startF-1,startE-1); i.e. 
 touches the top left corner of the current phrase pair (startF,startE).
 Whereas, for a swap orientation, we check if there is a phrase pair of 
 which the bottom left corner falls in (endF+1,startE-1); i.e. touches 
 the top right corner of the current phrase pair (endF,startE).

 I hope this will clarify the situation.

 Cheers,
 Nadi

 On Wed, 11 Apr 2012 14:18:20 -0400, Dennis Mehay wrote:
> Hi all,
>
> I've been looking into the extraction code for lexicalized reordering
> models, and I can't seem to wrap my head around it. Looking at the
> phrase-based lexicalized reordering extraction code, e.g., I see that
> a phrase is considered monotone wrt the previous phrase iff there is 
> a
> phrase consistent with the alignment (and within the phrase size
> limit) whose bottom right corner is adjacent to the top left corner 
> of
> the phrase in question.
>
> From github, the relevant code for
> mosesdecoder/scripts/training/phrase-extract/extract.cpp is:
>
> -----------------------------
>
>   if((connectedLeftTop && !connectedRightTop) ||
>  ...
>
>       ((it = inBottomRight.find(startE - unit)) !=
> inBottomRight.end() &&
>        it->second.find(startF-unit) != it->second.end()))
>     return LEFT;
> ------------------------------
>
> (This is assuming 'unit'=1, i.e., we are extracting wrt the previous
> phrase.)
>
> So far so, so good.  Next, however, a 'swap' orientation is predicted
> if there is a phrase whose bottom left corner is adjacent to the
> bottom left corner of the current phrase.  That doesn't make sense.
> Shouldn't it be the *top right* of the adjacent phrase that is
> touching the bottom left of the current phrase?
>
> Again from github, the code here is:
> ------------------------------
>
>  if((!connectedLeftTop && connectedRightTop) ||
>       ((it = inBottomLeft.find(startE - unit)) !=
> inBottomLeft.end() && it->second.find(endF + unit) !=
> it->second.end()))
>     return RIGHT;
> ------------------------------
>
>  If this is correct, I don't understand it. If it isn't, that could
> explain why moses is giving negative weights to my own home-brewed
> 'swap' orientation feature (that I extract with another script) --
> i.e., my notion of a 'swap' and moses's notion of it differ in
> incompatible ways.
>
> Can anyone advise?
>
> --D.N.

-- 
 Nadi Tomeh
 Univ. Paris XI, LIMSI-CNRS
 LIMSI Bât 508, Bureau 118
 BP 133 F-91403 ORSAY CEDEX
 Tél : +33/0 1 69 85 80 68
 [email protected]
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] extract.cpp: bug in lexicalized reordering extraction code?

Reply via email to