ah, it looks like a feature implemented specifically by IRSTLM. The IRST guys might be able to enlighten you on it shortly

On 29/01/2011 11:59, Dennis Mehay wrote:
Hi Hieu,

Thanks for the prompt reply. I'm a little confused, though. The moses manual, p.124, mentions that the microtag annotations (e.g., the '(', ')' and '+' in the sequence "NP( NP+ NP+ NP+ NP)", e.g.) are supposed to trigger chunk reduction rules like:

NP => NP                                                 [base case]
NP( NP) => NP                                         [recursive case 1]
NP( NP+ ... NP+ NP) => NP( ... NP+ NP) [recursive case 2]

This last bit is also in the IRST LM tutorial (p. 32) at:www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf <http://www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf>

This isn't quite a skip n-gram model (if that's in fact what the LanguageModelSkip branch uses). It's a way of annotating shallow syntactic boundaries. E.g., the two following sequences:

"NP( NP+ NP) VP"
"NP( NP+ NP) NP( NP) VP"

would reduce to the same thing with a skip n-gram model (minus the '(' ')' and '+' annotations) but to two distinct chunk sequences. Namely, using the chunk reduction rules, we have:
"NP VP and NP NP VP"

whereas a skip n-gram model (as you've sketched it) over the sequences:
"NP NP NP VP"
and
"NP NP NP NP NP VP"

would give:
"NP VP" and "NP VP"

I can see how the former (the chunk reduction rules) would interact in ways that break the substructure properties needed for efficient left-to-right decoding and with the stateful features of Moses, though, so I'm sure there are really good reasons not to use them by default, but the manual doesn't mention this.

But my question goes beyond even the interaction of Moses and IRST LM. IRST LM's documentation claims that it does these chunk reduction steps automagically, but the little test on 'corp' above doesn't work. So either (1) it doesn't do these things automagically, or (2) I'm missing some command-line switch or training file (e.g., the class-based mapping file which is mentioned -- well the words "mapped into" are mentioned -- in the context of discussing chunk-based asynchronous factor models in the Moses manual, p.124). This may not be your area of expertise.

I'll have a look at the LanguageModelSkip branch, to see what it does.

Thanks again.

--D.N.

On Fri, Jan 28, 2011 at 11:02 PM, Hieu Hoang <[email protected] <mailto:[email protected]>> wrote:

    hi dennis

    if i understand correctly, the tag sequence
       NP NP NP VP ADVP
    should be scored by the LM as
      NP VP ADVP

    Moses and the LM aren't set up to do that. To do that
       1. change the language model scoring algorithm. Something similar a
    while ago was tried. The code is in LanguageModelSkip

      2. In the moses.ini file, tell moses the order of the LM is large
    (eg. 7). Otherwise the decoder will only give small bits of the target
    sentence for the LM to see.

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to