ah, it looks like a feature implemented specifically by IRSTLM. The IRST
guys might be able to enlighten you on it shortly
On 29/01/2011 11:59, Dennis Mehay wrote:
Hi Hieu,
Thanks for the prompt reply. I'm a little confused, though. The
moses manual, p.124, mentions that the microtag annotations (e.g., the
'(', ')' and '+' in the sequence "NP( NP+ NP+ NP+ NP)", e.g.) are
supposed to trigger chunk reduction rules like:
NP => NP [base case]
NP( NP) => NP [recursive case 1]
NP( NP+ ... NP+ NP) => NP( ... NP+ NP) [recursive case 2]
This last bit is also in the IRST LM tutorial (p. 32)
at:www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf
<http://www.mt-archive.info/MTMarathon-2008-Bertoldi-ppt.pdf>
This isn't quite a skip n-gram model (if that's in fact what the
LanguageModelSkip branch uses). It's a way of annotating shallow
syntactic boundaries. E.g., the two following sequences:
"NP( NP+ NP) VP"
"NP( NP+ NP) NP( NP) VP"
would reduce to the same thing with a skip n-gram model (minus the '('
')' and '+' annotations) but to two distinct chunk sequences. Namely,
using the chunk reduction rules, we have:
"NP VP and NP NP VP"
whereas a skip n-gram model (as you've sketched it) over the sequences:
"NP NP NP VP"
and
"NP NP NP NP NP VP"
would give:
"NP VP" and "NP VP"
I can see how the former (the chunk reduction rules) would interact in
ways that break the substructure properties needed for efficient
left-to-right decoding and with the stateful features of Moses,
though, so I'm sure there are really good reasons not to use them by
default, but the manual doesn't mention this.
But my question goes beyond even the interaction of Moses and IRST
LM. IRST LM's documentation claims that it does these chunk reduction
steps automagically, but the little test on 'corp' above doesn't
work. So either (1) it doesn't do these things automagically, or (2)
I'm missing some command-line switch or training file (e.g., the
class-based mapping file which is mentioned -- well the words "mapped
into" are mentioned -- in the context of discussing chunk-based
asynchronous factor models in the Moses manual, p.124). This may not
be your area of expertise.
I'll have a look at the LanguageModelSkip branch, to see what it does.
Thanks again.
--D.N.
On Fri, Jan 28, 2011 at 11:02 PM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
hi dennis
if i understand correctly, the tag sequence
NP NP NP VP ADVP
should be scored by the LM as
NP VP ADVP
Moses and the LM aren't set up to do that. To do that
1. change the language model scoring algorithm. Something similar a
while ago was tried. The code is in LanguageModelSkip
2. In the moses.ini file, tell moses the order of the LM is large
(eg. 7). Otherwise the decoder will only give small bits of the target
sentence for the LM to see.
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support