Hi Hieu,
Thanks for the prompt reply. I'm a little confused, though. The moses
manual, p.124, mentions that the microtag annotations (e.g., the '(', ')'
and '+' in the sequence "NP( NP+ NP+ NP+ NP)", e.g.) are supposed to trigger
chunk reduction rules like:
NP => NP [base case]
NP( NP) => NP [recursive case 1]
NP( NP+ ... NP+ NP) => NP( ... NP+ NP) [recursive case 2]
This last bit is also in the IRST LM tutorial (p. 32) at:
www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf
This isn't quite a skip n-gram model (if that's in fact what the
LanguageModelSkip branch uses). It's a way of annotating shallow syntactic
boundaries. E.g., the two following sequences:
"NP( NP+ NP) VP"
"NP( NP+ NP) NP( NP) VP"
would reduce to the same thing with a skip n-gram model (minus the '(' ')'
and '+' annotations) but to two distinct chunk sequences. Namely, using the
chunk reduction rules, we have:
"NP VP and NP NP VP"
whereas a skip n-gram model (as you've sketched it) over the sequences:
"NP NP NP VP"
and
"NP NP NP NP NP VP"
would give:
"NP VP" and "NP VP"
I can see how the former (the chunk reduction rules) would interact in ways
that break the substructure properties needed for efficient left-to-right
decoding and with the stateful features of Moses, though, so I'm sure there
are really good reasons not to use them by default, but the manual doesn't
mention this.
But my question goes beyond even the interaction of Moses and IRST LM. IRST
LM's documentation claims that it does these chunk reduction steps
automagically, but the little test on 'corp' above doesn't work. So either
(1) it doesn't do these things automagically, or (2) I'm missing some
command-line switch or training file (e.g., the class-based mapping file
which is mentioned -- well the words "mapped into" are mentioned -- in the
context of discussing chunk-based asynchronous factor models in the Moses
manual, p.124). This may not be your area of expertise.
I'll have a look at the LanguageModelSkip branch, to see what it does.
Thanks again.
--D.N.
On Fri, Jan 28, 2011 at 11:02 PM, Hieu Hoang <[email protected]> wrote:
> hi dennis
>
> if i understand correctly, the tag sequence
> NP NP NP VP ADVP
> should be scored by the LM as
> NP VP ADVP
>
> Moses and the LM aren't set up to do that. To do that
> 1. change the language model scoring algorithm. Something similar a
> while ago was tried. The code is in LanguageModelSkip
>
> 2. In the moses.ini file, tell moses the order of the LM is large
> (eg. 7). Otherwise the decoder will only give small bits of the target
> sentence for the LM to see.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support