Just in case it confuses anyone, both commands (below) were run in the same way, I just simplified it for expository purposes to " moses_chart -f moses.ini -cube-pruning-pop-limit 2000" in the first case, but not in the second.
--D.N. 2011/6/22 Dennis Mehay <[email protected]> > Hi Philipp, > > Thanks for the reply. I tracked some of the cases down to a *known* word > (or whitespace-tokenized thingie, anyway -- I don't know much of what > constitutes a word in written Chinese) by doing the following: > > ---------------------------------------------------------------------- > $ echo "说了算" | moses_chart -f moses.ini -cube-pruning-pop-limit 2000 > > Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X > (1) [1,2]=X (1) [2,2]=X (1) > > Num of hypo = 1591 --- cells: > 0 1 2 > 1 3 1587 > 0 0 > 0 > NO BEST TRANSLATION > ---------------------------------------------------------------------- > > (An aside: 1587 is the number of categories in the unknown word list. Why > does the last token, viz., "</s>", get that many cells? ) > > Anyhow, sure enough, there are three entries for the middle token "说了算" > > ---------------------------------------------------------------------- > $ zless rule-table.gz > ... > 说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] ||| 0.000113126 > 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3 > 说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] ||| 0.000309866 6.94e-05 > 0.00475133 0.00028945 2.718 ||| ||| 46 3 > 说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] ||| 0.000208847 > 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3 > ... > ---------------------------------------------------------------------- > > There are entries in the glue table for these three categories -- > ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP), (S\NP[expl])/(S[to]\NP) and > (S\NP[expl])/(S[b]\NP) --- so we should be able to hack together a > translation using any of them. > > ---------------------------------------------------------------------- > <s> [X] ||| <s> [Q] ||| 1 ||| > ... > [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] ||| [X][Q] > [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 > ... > [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q] > [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 > ... > [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q] > [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 > ... > ---------------------------------------------------------------------- > > And just to be sure that it isn't an unknown word problem, let's mangle the > token "说了算" by deleting the last character and see what happens: > > ---------------------------------------------------------------------- > $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f > dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000 > Translating: <s> 说了 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X > (1) [1,2]=X (1) [2,2]=X (1) > > Num of hypo = 6396 --- cells: > 0 1 2 > 1 1587 1587 > 1 0 > 1 > BEST TRANSLATION: 4763 Q </s> :0-0 : pC=0.000, c=-1.002 [0..2] 3176 > [total=-22.789] <<-1.303, -1.940, -46.302, 0.000, 0.000, 0.000, 0.000, > 0.000, 1.000>> > 说了 > ---------------------------------------------------------------------- > > The best "translation" is just a pass-through, as expected (and there are > 1587 nodes for that unknown token -- just as many as there are unknown word > lhs's in the unknown-lhs file). > > Strange. Very strange. Or am I missing the obvious? > > I'm at a loss here. Does anyone have any guesses as to what's going on > here? > > --D.N. > > > 2011/6/22 Philipp Koehn <[email protected]> > >> Hi, >> >> there always should be a rule to combine a span to the left. >> >> Check what labels are chosen for the 13th word, and why there >> are no glue rules for it. >> >> If I would hazard a guess, I would suspect that this is an >> unknown word and a file with the likely labels for unknown words >> is used, but these do not match the glue grammar. >> >> -phi >> >> 2011/6/22 Dennis Mehay <[email protected]>: >> > Hi all, >> > >> > I posted this, but it bounced. My attachments were too big. I'm >> resending >> > without the larger attachment. Apologies for any duplicate posting. >> > >> > I'm running moses_chart to do some syntax-based MT experiments, and, >> during >> > tuning, I'm coming across some instances where the decoder can't produce >> a >> > translation (btw 32 and 38 in a 500 sentence tuning set). This should >> not >> > be happening, so far as I can tell, since I have a glue grammar (where >> all >> > the nonterminals of the training set plus the [Q] nonterminal are >> accounted >> > for), and an 'unknown-lhs' list with the relative frequencies of all the >> > categories as they span only a single word in the training set (i.e., >> the >> > frequency of each category's spanning a single word in the rule table / >> the >> > total number of single-word instances in the rule table). >> > >> > Here is an example of a sentence that there was no translation for: >> > >> > ------------------------------ >> > --------------------------------------------------------- >> > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s> >> > ... >> > Decoding: >> > Num of hypo = 84813 --- cells: >> > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 >> 18 >> > 19 20 21 >> > 1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100 >> > 85 3 14 1000 >> > 40 960 278 717 916 857 976 276 396 952 958 150 0 0 919 74 402 >> 802 >> > 0 0 12 >> > 200 975 908 849 850 858 968 971 971 862 974 0 0 0 852 865 984 >> > 0 0 0 >> > 200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984 >> 0 >> > 0 0 >> > 200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0 >> > 0 0 >> > 200 828 910 801 838 796 722 870 0 0 0 0 0 0 0 >> 0 >> > 0 >> > 200 799 914 832 801 745 926 0 0 0 0 0 0 0 0 >> 0 >> > 200 756 819 901 693 692 0 0 0 0 0 0 0 0 >> 0 >> > 200 716 680 665 437 0 0 0 0 0 0 0 0 0 >> > 200 683 527 929 0 0 0 0 0 0 0 0 0 >> > 200 532 588 0 0 0 0 0 0 0 0 0 >> > 200 580 0 0 0 0 0 0 0 0 0 >> > 200 0 0 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 >> > 0 0 0 0 0 >> > 0 0 0 0 >> > 0 0 0 >> > 0 0 >> > 0 >> > NO BEST TRANSLATION >> > >> > Translation took 4.340 seconds >> > >> --------------------------------------------------------------------------------------- >> > >> > The ASCII-art chart's alignment may be a bit off, but, just eye-balling >> it, >> > it looks as if the 19th word (index 18) has a chart entry count above >> it, >> > but then this entry does not get combined with what's to the left using >> the >> > glue rules. >> > >> > Could this be a pruning or cutoff issue (i.e., stack size, >> > cube-pruning-pop-limit, maximum number of rules per span, etc.)? Or >> maybe >> > it has to do with the fact that my unknown-lhs file has *all* categories >> > that spanned a single word in the training set. Maybe I should prune it >> to >> > the top 10 or 20, or so. I'm really at a loss here. I thought the glue >> > grammar would make the decoder always return an answer, no matter how >> awful. >> > >> > Any insight? >> > >> > I have attached my moses.ini file in case anyone wants to have a look. >> I >> > can also send the glue rule file later, but, as I said, it seems to >> account >> > for all of the training set's categories (and it was produced >> automatically >> > using the -glue-grammar option). >> > >> > Best, >> > Dennis >> > _______________________________________________ >> > Moses-support mailing list >> > [email protected] >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
