Hi all, I posted this, but it bounced. My attachments were too big. I'm resending without the larger attachment. Apologies for any duplicate posting.
I'm running moses_chart to do some syntax-based MT experiments, and, during
tuning, I'm coming across some instances where the decoder can't produce a
translation (btw 32 and 38 in a 500 sentence tuning set). This should not
be happening, so far as I can tell, since I have a glue grammar (where all
the nonterminals of the training set plus the [Q] nonterminal are accounted
for), and an 'unknown-lhs' list with the relative frequencies of all the
categories as they span only a single word in the training set (i.e., the
frequency of each category's spanning a single word in the rule table / the
total number of single-word instances in the rule table).
Here is an example of a sentence that there was no translation for:
------------------------------
---------------------------------------------------------
Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
...
Decoding:
Num of hypo = 84813 --- cells:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21
1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100
85 3 14 1000
40 960 278 717 916 857 976 276 396 952 958 150 0 0 919 74 402 802
0 0 12
200 975 908 849 850 858 968 971 971 862 974 0 0 0 852 865 984
0 0 0
200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984 0
0 0
200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0
0 0
200 828 910 801 838 796 722 870 0 0 0 0 0 0 0 0
0
200 799 914 832 801 745 926 0 0 0 0 0 0 0 0 0
200 756 819 901 693 692 0 0 0 0 0 0 0 0 0
200 716 680 665 437 0 0 0 0 0 0 0 0 0
200 683 527 929 0 0 0 0 0 0 0 0 0
200 532 588 0 0 0 0 0 0 0 0 0
200 580 0 0 0 0 0 0 0 0 0
200 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0
NO BEST TRANSLATION
Translation took 4.340 seconds
---------------------------------------------------------------------------------------
The ASCII-art chart's alignment may be a bit off, but, just eye-balling it,
it looks as if the 19th word (index 18) has a chart entry count above it,
but then this entry does not get combined with what's to the left using the
glue rules.
Could this be a pruning or cutoff issue (i.e., stack size,
cube-pruning-pop-limit, maximum number of rules per span, etc.)? Or maybe
it has to do with the fact that my unknown-lhs file has *all* categories
that spanned a single word in the training set. Maybe I should prune it to
the top 10 or 20, or so. I'm really at a loss here. I thought the glue
grammar would make the decoder always return an answer, no matter how awful.
Any insight?
I have attached my moses.ini file in case anyone wants to have a look. I
can also send the glue rule file later, but, as I said, it seems to account
for all of the training set's categories (and it was produced automatically
using the -glue-grammar option).
Best,
Dennis
moses.ini
Description: Binary data
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
