Hi, there always should be a rule to combine a span to the left.
Check what labels are chosen for the 13th word, and why there are no glue rules for it. If I would hazard a guess, I would suspect that this is an unknown word and a file with the likely labels for unknown words is used, but these do not match the glue grammar. -phi 2011/6/22 Dennis Mehay <[email protected]>: > Hi all, > > I posted this, but it bounced. My attachments were too big. I'm resending > without the larger attachment. Apologies for any duplicate posting. > > I'm running moses_chart to do some syntax-based MT experiments, and, during > tuning, I'm coming across some instances where the decoder can't produce a > translation (btw 32 and 38 in a 500 sentence tuning set). This should not > be happening, so far as I can tell, since I have a glue grammar (where all > the nonterminals of the training set plus the [Q] nonterminal are accounted > for), and an 'unknown-lhs' list with the relative frequencies of all the > categories as they span only a single word in the training set (i.e., the > frequency of each category's spanning a single word in the rule table / the > total number of single-word instances in the rule table). > > Here is an example of a sentence that there was no translation for: > > ------------------------------ > --------------------------------------------------------- > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s> > ... > Decoding: > Num of hypo = 84813 --- cells: > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 > 19 20 21 > 1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100 > 85 3 14 1000 > 40 960 278 717 916 857 976 276 396 952 958 150 0 0 919 74 402 802 > 0 0 12 > 200 975 908 849 850 858 968 971 971 862 974 0 0 0 852 865 984 > 0 0 0 > 200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984 0 > 0 0 > 200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0 > 0 0 > 200 828 910 801 838 796 722 870 0 0 0 0 0 0 0 0 > 0 > 200 799 914 832 801 745 926 0 0 0 0 0 0 0 0 0 > 200 756 819 901 693 692 0 0 0 0 0 0 0 0 0 > 200 716 680 665 437 0 0 0 0 0 0 0 0 0 > 200 683 527 929 0 0 0 0 0 0 0 0 0 > 200 532 588 0 0 0 0 0 0 0 0 0 > 200 580 0 0 0 0 0 0 0 0 0 > 200 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 > 0 0 0 0 0 > 0 0 0 0 > 0 0 0 > 0 0 > 0 > NO BEST TRANSLATION > > Translation took 4.340 seconds > --------------------------------------------------------------------------------------- > > The ASCII-art chart's alignment may be a bit off, but, just eye-balling it, > it looks as if the 19th word (index 18) has a chart entry count above it, > but then this entry does not get combined with what's to the left using the > glue rules. > > Could this be a pruning or cutoff issue (i.e., stack size, > cube-pruning-pop-limit, maximum number of rules per span, etc.)? Or maybe > it has to do with the fact that my unknown-lhs file has *all* categories > that spanned a single word in the training set. Maybe I should prune it to > the top 10 or 20, or so. I'm really at a loss here. I thought the glue > grammar would make the decoder always return an answer, no matter how awful. > > Any insight? > > I have attached my moses.ini file in case anyone wants to have a look. I > can also send the glue rule file later, but, as I said, it seems to account > for all of the training set's categories (and it was produced automatically > using the -glue-grammar option). > > Best, > Dennis > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
