the section
[non-terminals]
X
is only used for OOV handling, you don't need to fill it out with all
the non-terminals. In fact, we trying to get rid of it & replace it with
a list of non-terminals from a file.
the phrase extraction script should create 2 additional files, beside
the phrase table:
1. unknown-word-label. This looks like
ADJA 0.110016
NE 0.0785407
NN 0.680048
For OOV words, a new rule is created with these as the left
hand side.
2. glue-grammar. This looks like
<s> [X] ||| <s> [Q] ||| ||| 1
[X][Q] </s> [X] ||| [X][Q] </s> [Q] ||| 0-0 ||| 1
[X][Q] [X][AA] [X] ||| [X][Q] [X][AA] [Q] ||| 0-0 1-1 ||| 2.718
.
.
The rules in the glue grammar has to be able to clip onto the
start & end-of-sentence symbols, <s> & </s>. And there must be a glue
rule for every single non-terminal type. Hopefully, you'll get what I
mean if you look thru the file...
Attached is phil's ini file, unknown word & glue grammar files for his
english-german.
On 23/06/2010 17:14, Lucia Specia wrote:
Hi again,
I notice that I should probably have all non-terminals listed in the
moses.ini file. I only have the default 'X' there. Is this something
that has to be done or should the training script take care of it? I
have many non-terminals, they are all represented following the
specification on the website, e.g.:
<tree label="S"> <tree label="FCL"> <tree label="NP"> ....
Regarding the glue grammar, could you please give me some pointers on
what exactly has to be on it and what the format should be?
Thanks a lot,
Lucia
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
ADJA 0.110016
NE 0.0785407
NN 0.680048
<s> [X] ||| <s> [Q] ||| ||| 1
[X][Q] </s> [X] ||| [X][Q] </s> [Q] ||| 0-0 ||| 1
<s> [X][ADJA] </s> [X] ||| <s> [X][ADJA] </s> [Q] ||| 1-1 ||| 1
<s> [X][ADV] </s> [X] ||| <s> [X][ADV] </s> [Q] ||| 1-1 ||| 1
<s> [X][AP] </s> [X] ||| <s> [X][AP] </s> [Q] ||| 1-1 ||| 1
<s> [X][CH] </s> [X] ||| <s> [X][CH] </s> [Q] ||| 1-1 ||| 1
<s> [X][CNP] </s> [X] ||| <s> [X][CNP] </s> [Q] ||| 1-1 ||| 1
<s> [X][CO] </s> [X] ||| <s> [X][CO] </s> [Q] ||| 1-1 ||| 1
<s> [X][CPP] </s> [X] ||| <s> [X][CPP] </s> [Q] ||| 1-1 ||| 1
<s> [X][CS] </s> [X] ||| <s> [X][CS] </s> [Q] ||| 1-1 ||| 1
<s> [X][CVP] </s> [X] ||| <s> [X][CVP] </s> [Q] ||| 1-1 ||| 1
<s> [X][DL] </s> [X] ||| <s> [X][DL] </s> [Q] ||| 1-1 ||| 1
<s> [X][NE] </s> [X] ||| <s> [X][NE] </s> [Q] ||| 1-1 ||| 1
<s> [X][NN] </s> [X] ||| <s> [X][NN] </s> [Q] ||| 1-1 ||| 1
<s> [X][NP] </s> [X] ||| <s> [X][NP] </s> [Q] ||| 1-1 ||| 1
<s> [X][PN] </s> [X] ||| <s> [X][PN] </s> [Q] ||| 1-1 ||| 1
<s> [X][PP] </s> [X] ||| <s> [X][PP] </s> [Q] ||| 1-1 ||| 1
<s> [X][PUNC.] </s> [X] ||| <s> [X][PUNC.] </s> [Q] ||| 1-1 ||| 1
<s> [X][S] </s> [X] ||| <s> [X][S] </s> [Q] ||| 1-1 ||| 1
<s> [X][TOP] </s> [X] ||| <s> [X][TOP] </s> [Q] ||| 1-1 ||| 1
<s> [X][VP] </s> [X] ||| <s> [X][VP] </s> [Q] ||| 1-1 ||| 1
<s> [X][XY] </s> [X] ||| <s> [X][XY] </s> [Q] ||| 1-1 ||| 1
[X][Q] [X][AA] [X] ||| [X][Q] [X][AA] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][ADJA] [X] ||| [X][Q] [X][ADJA] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][ADJD] [X] ||| [X][Q] [X][ADJD] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][ADV] [X] ||| [X][Q] [X][ADV] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][AP] [X] ||| [X][Q] [X][AP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][APPO] [X] ||| [X][Q] [X][APPO] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][APPR] [X] ||| [X][Q] [X][APPR] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][APPRART] [X] ||| [X][Q] [X][APPRART] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][APZR] [X] ||| [X][Q] [X][APZR] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][ART] [X] ||| [X][Q] [X][ART] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][AVP] [X] ||| [X][Q] [X][AVP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CAC] [X] ||| [X][Q] [X][CAC] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CAP] [X] ||| [X][Q] [X][CAP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CARD] [X] ||| [X][Q] [X][CARD] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CAVP] [X] ||| [X][Q] [X][CAVP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CH] [X] ||| [X][Q] [X][CH] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CNP] [X] ||| [X][Q] [X][CNP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CO] [X] ||| [X][Q] [X][CO] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CPP] [X] ||| [X][Q] [X][CPP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CS] [X] ||| [X][Q] [X][CS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CVP] [X] ||| [X][Q] [X][CVP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][CVZ] [X] ||| [X][Q] [X][CVZ] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][DL] [X] ||| [X][Q] [X][DL] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][FM] [X] ||| [X][Q] [X][FM] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][ISU] [X] ||| [X][Q] [X][ISU] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][ITJ] [X] ||| [X][Q] [X][ITJ] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][KOKOM] [X] ||| [X][Q] [X][KOKOM] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][KON] [X] ||| [X][Q] [X][KON] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][KOUI] [X] ||| [X][Q] [X][KOUI] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][KOUS] [X] ||| [X][Q] [X][KOUS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][MTA] [X] ||| [X][Q] [X][MTA] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][NE] [X] ||| [X][Q] [X][NE] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][NM] [X] ||| [X][Q] [X][NM] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][NN] [X] ||| [X][Q] [X][NN] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][NNE] [X] ||| [X][Q] [X][NNE] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][NP] [X] ||| [X][Q] [X][NP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PDAT] [X] ||| [X][Q] [X][PDAT] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PDS] [X] ||| [X][Q] [X][PDS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PIAT] [X] ||| [X][Q] [X][PIAT] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PIS] [X] ||| [X][Q] [X][PIS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PN] [X] ||| [X][Q] [X][PN] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PP] [X] ||| [X][Q] [X][PP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PPER] [X] ||| [X][Q] [X][PPER] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PPOSAT] [X] ||| [X][Q] [X][PPOSAT] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PPOSS] [X] ||| [X][Q] [X][PPOSS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PRELAT] [X] ||| [X][Q] [X][PRELAT] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PRELS] [X] ||| [X][Q] [X][PRELS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PRF] [X] ||| [X][Q] [X][PRF] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PROAV] [X] ||| [X][Q] [X][PROAV] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PTKA] [X] ||| [X][Q] [X][PTKA] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PTKANT] [X] ||| [X][Q] [X][PTKANT] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PTKNEG] [X] ||| [X][Q] [X][PTKNEG] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PTKVZ] [X] ||| [X][Q] [X][PTKVZ] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PTKZU] [X] ||| [X][Q] [X][PTKZU] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PUNC,] [X] ||| [X][Q] [X][PUNC,] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PUNC.] [X] ||| [X][Q] [X][PUNC.] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PUNCPar] [X] ||| [X][Q] [X][PUNCPar] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PWAT] [X] ||| [X][Q] [X][PWAT] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PWAV] [X] ||| [X][Q] [X][PWAV] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][PWS] [X] ||| [X][Q] [X][PWS] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][S] [X] ||| [X][Q] [X][S] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][TOP] [X] ||| [X][Q] [X][TOP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][TRUNC] [X] ||| [X][Q] [X][TRUNC] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VAFIN] [X] ||| [X][Q] [X][VAFIN] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VAINF] [X] ||| [X][Q] [X][VAINF] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VAPP] [X] ||| [X][Q] [X][VAPP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VMFIN] [X] ||| [X][Q] [X][VMFIN] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VMINF] [X] ||| [X][Q] [X][VMINF] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VMPP] [X] ||| [X][Q] [X][VMPP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VP] [X] ||| [X][Q] [X][VP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VVFIN] [X] ||| [X][Q] [X][VVFIN] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VVIMP] [X] ||| [X][Q] [X][VVIMP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VVINF] [X] ||| [X][Q] [X][VVINF] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VVIZU] [X] ||| [X][Q] [X][VVIZU] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VVPP] [X] ||| [X][Q] [X][VVPP] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][VZ] [X] ||| [X][Q] [X][VZ] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][XY] [X] ||| [X][Q] [X][XY] [Q] ||| 0-0 1-1 ||| 2.718
[X][Q] [X][X] [X] ||| [X][Q] [X][X] [Q] ||| 0-0 1-1 ||| 2.718
#########################
### MOSES CONFIG FILE ###
#########################
# input factors
[input-factors]
0
# mapping steps
[mapping]
0 T 0
1 T 1
# translation tables: source-factors, target-factors, number of scores, file
[ttable-file]
6 0 0 5
/home/s0898777/experiments/wmt10-en-de-target-syntax/model/phrase-table.1
6 0 0 1
/home/s0898777/experiments/wmt10-en-de-target-syntax/model/glue-grammar.1
# no generation models, no generation-file section
# language models: type(srilm/irstlm), factors, order, file
[lmodel-file]
0 0 5 /home/s0898777/experiments/wmt10-en-de-target-syntax/lm/interpolated-lm.1
# limit on how many phrase translations e for each phrase f are loaded
# 0 = all elements loaded
[ttable-limit]
20
# language model weights
[weight-l]
0.5000
# translation model weights
[weight-t]
0.2
0.2
0.2
0.2
0.2
1.0
# no generation models, no weight-generation section
# word penalty
[weight-w]
-1
[cube-pruning-pop-limit]
1000
[glue-rule-type]
0
[non-terminals]
X
[search-algorithm]
3
[inputtype]
3
[max-chart-span]
20
1000
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support