Hello all,
So I thought I had the chunk LM thing licked (thanks to Mauro Cettolo and
Nicola Bertoldi), but apparently it still doesn't work.
Here's what I've done (with a fresh install of moses):
(1) Build a translation table with a single entry:
das ist ein Test . ||| this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP)
||| 1 1 1 1 1 ||| ||| 100 100
To the uninitiated, the ...#C( ...#C+ and ...#C) are annotations that IRST
LM in collapsing mode will use to construct an asynchronous sequence of
syntactic labels 'C' -- in this case, "S[dcl]/NP( S[dcl]/NP) NP( NP+ NP)"
will collapse to "S[dcl]/NP NP". ("Asynchronous" because it is not in sync
with the number of words, I suppose.)
(2) I have a language model that contains the following entry (among many
others):
...
\4-grams
...
-1.690494 <s> S[dcl]/NP NP </s>
...
(3) I created config files for both LMs: the word-based LM that will score
sequences over field one -- i.e., what comes before the '#' in each target
token -- and the syntactic chunk LM that will score collapsed sequences over
field 2.
Word LM config:
--------------
LMMACRO 5 0 false
/path/to/lm/interp.ord5.lm.en.blm
--------------
Chunk LM config:
--------------
LMMACRO 28 1 true
/path/to/lm/chunk.4gram.blm
/path/to/lm/chunk.4gram.map
--------------
These are given to moses in its config file.
What should happen -- as I understand it -- is that moses should produce the
translation "this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP)" with two LM
scores that make sense for the sequences "<s> this is a test . </s>" and
"<s> S[dcl]/NP NP </s>", but instead in the n-best file I get:
0 ||| this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP) ||| d: 0 lm:
-16.2291 -13.4335 w: -5 tm: 0 0 0 0 0 ||| -9.83129
...
which makes no sense if IRST LM and moses are behaving as advertised.
For what it's worth, at the command line, the IRST LM 'compile-lm.sh' tool
works properly:
---------------------------------------------
$ echo "<s> this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP) </s>" >
test.txt
$ ompile-lm --eval test.txt chunk.config --debug 5
inpfile: chunk.config
dub: 10000000
Language Model Type of chunk.config is 2
selected field n. 1
collapse is enabled
lmfilename:/path/to/lm/chunk.4gram.blm
mapfilename:/path/to/lm/chunk.4gram.map
blmt
loadbin()
loading 6165 1-grams
loading 295387 2-grams
loading 5439440 3-grams
loading 54727737 4-grams
done
OOV code is 6165
Reading map /path/to/lm/chunk.4gram.map...
...done
OOV code is 18497
lmtable has direct ngrams
Start Eval
OOV code: 18497
<s> this#S[dcl]/NP( 1 [2-gram: recombine:1 state:0x2194cb8] -1.92622101
bow:0.00000000
[t=0.01501625] POSSIBLE ERROR
<s> this#S[dcl]/NP( is#S[dcl]/NP) 1 [3-gram: recombine:1 state:0x2194cb8]
0.00000000 bow:0.00000000
[t=1.00745490] POSSIBLE ERROR
<s> this#S[dcl]/NP( is#S[dcl]/NP) a#NP( 1 [4-gram: recombine:2
state:0x7f756daa3e6a] -0.74154949 bow:0.00000000
[t=0.18886188] POSSIBLE ERROR
<s> this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ 1 [5-gram: recombine:2
state:0x7f756daa3e6a] 0.00000000 bow:0.00000000
[t=1.06001705] POSSIBLE ERROR
<s> this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP) 1 [6-gram:
recombine:2 state:0x7f756daa3e6a] 0.00000000 bow:0.00000000
[t=2.06001705] POSSIBLE ERROR
<s> this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP) </s> 1 [7-gram:
recombine:3 state:0x7f756caec5a3] -1.69049394 bow:0.00000000
[t=0.07429103] POSSIBLE ERROR
%% Nw=6 PP=5.32570869 PPwp=0.00000000 Nbo=0 Noov=0 OOV=0.00000000%
logPr=-4.35826445
lmtable class statistics
levels 4
lev 1 entries 6165 used mem 0.09Mb
lev 2 entries 295387 used mem 4.23Mb
lev 3 entries 5439440 used mem 77.81Mb
lev 4 entries 54727737 used mem 365.35Mb
total allocated mem 447.47Mb
total number of get and binary search calls
level 1 get: 110990 bsearch: 0
level 2 get: 110994 bsearch: 259001
level 3 get: 92498 bsearch: 148007
level 4 get: 55509 bsearch: 55509
lmtable class statistics
levels 4
lev 1 entries 6165 used mem 0.09Mb
lev 2 entries 295387 used mem 4.23Mb
lev 3 entries 5439440 used mem 77.81Mb
lev 4 entries 54727737 used mem 365.35Mb
total allocated mem 447.47Mb
total number of get and binary search calls
level 1 get: 110990 bsearch: 0
level 2 get: 110994 bsearch: 259001
level 3 get: 92498 bsearch: 148007
level 4 get: 55509 bsearch: 55509
---------------------------------------------
Note the line (from above):
<s> this#S[dcl]/NP( is#S[dcl]/NP) a#NP( test#NP+ .#NP) </s> 1 [7-gram:
recombine:3 state:0x7f756caec5a3] -1.69049394 bow:0.00000000
The log-prob -1.69049394 is the correct probability for the 4-gram "<s>
S[dcl]/NP NP </s>".
Does anyone see what I'm doing wrong here?
Best,
Dennis
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support