Re: [Moses-support] Asynchronous factors and IRST LM (not working)

Hieu Hoang Fri, 28 Jan 2011 21:29:31 -0800

ah, it looks like a feature implemented specifically by IRSTLM. The IRSTguys might be able to enlighten you on it shortly


On 29/01/2011 11:59, Dennis Mehay wrote:

Hi Hieu,
Thanks for the prompt reply. I'm a little confused, though. Themoses manual, p.124, mentions that the microtag annotations (e.g., the'(', ')' and '+' in the sequence "NP( NP+ NP+ NP+ NP)", e.g.) aresupposed to trigger chunk reduction rules like:
NP => NP                                                 [base case]
NP( NP) => NP                                         [recursive case 1]
NP( NP+ ... NP+ NP) => NP( ... NP+ NP) [recursive case 2]
This last bit is also in the IRST LM tutorial (p. 32)at:www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf<http://www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf>
This isn't quite a skip n-gram model (if that's in fact what theLanguageModelSkip branch uses). It's a way of annotating shallowsyntactic boundaries. E.g., the two following sequences:
"NP( NP+ NP) VP"
"NP( NP+ NP) NP( NP) VP"
would reduce to the same thing with a skip n-gram model (minus the '('')' and '+' annotations) but to two distinct chunk sequences. Namely,using the chunk reduction rules, we have:
"NP VP and NP NP VP"

whereas a skip n-gram model (as you've sketched it) over the sequences:
"NP NP NP VP"
and
"NP NP NP NP NP VP"

would give:
"NP VP" and "NP VP"
I can see how the former (the chunk reduction rules) would interact inways that break the substructure properties needed for efficientleft-to-right decoding and with the stateful features of Moses,though, so I'm sure there are really good reasons not to use them bydefault, but the manual doesn't mention this.
But my question goes beyond even the interaction of Moses and IRSTLM. IRST LM's documentation claims that it does these chunk reductionsteps automagically, but the little test on 'corp' above doesn'twork. So either (1) it doesn't do these things automagically, or (2)I'm missing some command-line switch or training file (e.g., theclass-based mapping file which is mentioned -- well the words "mappedinto" are mentioned -- in the context of discussing chunk-basedasynchronous factor models in the Moses manual, p.124). This may notbe your area of expertise.
I'll have a look at the LanguageModelSkip branch, to see what it does.

Thanks again.

--D.N.
On Fri, Jan 28, 2011 at 11:02 PM, Hieu Hoang <[email protected]<mailto:[email protected]>> wrote:
    hi dennis

    if i understand correctly, the tag sequence
       NP NP NP VP ADVP
    should be scored by the LM as
      NP VP ADVP

    Moses and the LM aren't set up to do that. To do that
       1. change the language model scoring algorithm. Something similar a
    while ago was tried. The code is in LanguageModelSkip

      2. In the moses.ini file, tell moses the order of the LM is large
    (eg. 7). Otherwise the decoder will only give small bits of the target
    sentence for the LM to see.

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Asynchronous factors and IRST LM (not working)

Reply via email to