hi dennis You're right, it should be working. The entries in the glue rules might be pruned. Can you try to change the [table-limit] in the ini file to [ttable-limit] 100 10000000 or [ttable-limit] 100 0
Each row correspond to the table pruning limit for each table. If you provide only 1 entry, then it prune every table uniformly. StaticData.cpp (line 894) For a grammar with lots of non-terminals like yours, the table limit may be cutting off the some of the entries in the glue rule table Also, the decoder shouldn't be processing <s> and </s> as unknown words, they should only be translated by the glue rules. This is the reason you get 1587 translations of </s>. I corrected this behaviour recently http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004 On 23/06/2011 06:14, Dennis Mehay wrote: > Just in case it confuses anyone, both commands (below) were run in the > same way, I just simplified it for expository purposes to " > moses_chart -f moses.ini -cube-pruning-pop-limit 2000" in the first > case, but not in the second. > > --D.N. > > 2011/6/22 Dennis Mehay <[email protected] <mailto:[email protected]>> > > Hi Philipp, > > Thanks for the reply. I tracked some of the cases down to a > *known* word (or whitespace-tokenized thingie, anyway -- I don't > know much of what constitutes a word in written Chinese) by doing > the following: > > ---------------------------------------------------------------------- > $ echo "说了算" | moses_chart -f moses.ini -cube-pruning-pop-limit > 2000 > > Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X > (1) [1,1]=X (1) [1,2]=X (1) [2,2]=X (1) > > Num of hypo = 1591 --- cells: > 0 1 2 > 1 3 1587 > 0 0 > 0 > NO BEST TRANSLATION > ---------------------------------------------------------------------- > > (An aside: 1587 is the number of categories in the unknown word > list. Why does the last token, viz., "</s>", get that many cells? ) > > Anyhow, sure enough, there are three entries for the middle token > "说了算" > > ---------------------------------------------------------------------- > $ zless rule-table.gz > ... > 说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] ||| > 0.000113126 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3 > 说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] ||| > 0.000309866 6.94e-05 0.00475133 0.00028945 2.718 ||| ||| 46 3 > 说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] ||| > 0.000208847 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3 > ... > ---------------------------------------------------------------------- > > There are entries in the glue table for these three categories -- > ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP), (S\NP[expl])/(S[to]\NP) and > (S\NP[expl])/(S[b]\NP) --- so we should be able to hack together a > translation using any of them. > > ---------------------------------------------------------------------- > <s> [X] ||| <s> [Q] ||| 1 ||| > ... > [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] ||| [X][Q] > [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 > ... > [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q] > [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 > ... > [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q] > [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 > ... > ---------------------------------------------------------------------- > > And just to be sure that it isn't an unknown word problem, let's > mangle the token "说了算" by deleting the last character and see > what happens: > > ---------------------------------------------------------------------- > $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f > dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000 > Translating: <s> 说了 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) > [1,1]=X (1) [1,2]=X (1) [2,2]=X (1) > > Num of hypo = 6396 --- cells: > 0 1 2 > 1 1587 1587 > 1 0 > 1 > BEST TRANSLATION: 4763 Q </s> :0-0 : pC=0.000, c=-1.002 [0..2] > 3176 [total=-22.789] <<-1.303, -1.940, -46.302, 0.000, 0.000, > 0.000, 0.000, 0.000, 1.000>> > 说了 > ---------------------------------------------------------------------- > > The best "translation" is just a pass-through, as expected (and > there are 1587 nodes for that unknown token -- just as many as > there are unknown word lhs's in the unknown-lhs file). > > Strange. Very strange. Or am I missing the obvious? > > I'm at a loss here. Does anyone have any guesses as to what's > going on here? > > --D.N. > > > 2011/6/22 Philipp Koehn <[email protected] > <mailto:[email protected]>> > > Hi, > > there always should be a rule to combine a span to the left. > > Check what labels are chosen for the 13th word, and why there > are no glue rules for it. > > If I would hazard a guess, I would suspect that this is an > unknown word and a file with the likely labels for unknown words > is used, but these do not match the glue grammar. > > -phi > > 2011/6/22 Dennis Mehay <[email protected] > <mailto:[email protected]>>: > > Hi all, > > > > I posted this, but it bounced. My attachments were too big. > I'm resending > > without the larger attachment. Apologies for any duplicate > posting. > > > > I'm running moses_chart to do some syntax-based MT > experiments, and, during > > tuning, I'm coming across some instances where the decoder > can't produce a > > translation (btw 32 and 38 in a 500 sentence tuning set). > This should not > > be happening, so far as I can tell, since I have a glue > grammar (where all > > the nonterminals of the training set plus the [Q] > nonterminal are accounted > > for), and an 'unknown-lhs' list with the relative > frequencies of all the > > categories as they span only a single word in the training > set (i.e., the > > frequency of each category's spanning a single word in the > rule table / the > > total number of single-word instances in the rule table). > > > > Here is an example of a sentence that there was no > translation for: > > > > ------------------------------ > > --------------------------------------------------------- > > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 > 谁 说了算 , 谁 官 大 谁 说了算 . </s> > > ... > > Decoding: > > Num of hypo = 84813 --- cells: > > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 > > 19 20 21 > > 1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100 > > 85 3 14 1000 > > 40 960 278 717 916 857 976 276 396 952 958 150 0 0 919 74 > 402 802 > > 0 0 12 > > 200 975 908 849 850 858 968 971 971 862 974 0 0 0 852 865 984 > > 0 0 0 > > 200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984 0 > > 0 0 > > 200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0 > > 0 0 > > 200 828 910 801 838 796 722 870 0 0 0 0 0 0 0 0 > > 0 > > 200 799 914 832 801 745 926 0 0 0 0 0 0 0 0 0 > > 200 756 819 901 693 692 0 0 0 0 0 0 0 0 0 > > 200 716 680 665 437 0 0 0 0 0 0 0 0 0 > > 200 683 527 929 0 0 0 0 0 0 0 0 0 > > 200 532 588 0 0 0 0 0 0 0 0 0 > > 200 580 0 0 0 0 0 0 0 0 0 > > 200 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 > > 0 0 0 0 0 > > 0 0 0 0 > > 0 0 0 > > 0 0 > > 0 > > NO BEST TRANSLATION > > > > Translation took 4.340 seconds > > > > --------------------------------------------------------------------------------------- > > > > The ASCII-art chart's alignment may be a bit off, but, just > eye-balling it, > > it looks as if the 19th word (index 18) has a chart entry > count above it, > > but then this entry does not get combined with what's to the > left using the > > glue rules. > > > > Could this be a pruning or cutoff issue (i.e., stack size, > > cube-pruning-pop-limit, maximum number of rules per span, > etc.)? Or maybe > > it has to do with the fact that my unknown-lhs file has > *all* categories > > that spanned a single word in the training set. Maybe I > should prune it to > > the top 10 or 20, or so. I'm really at a loss here. I > thought the glue > > grammar would make the decoder always return an answer, no > matter how awful. > > > > Any insight? > > > > I have attached my moses.ini file in case anyone wants to > have a look. I > > can also send the glue rule file later, but, as I said, it > seems to account > > for all of the training set's categories (and it was > produced automatically > > using the -glue-grammar option). > > > > Best, > > Dennis > > _______________________________________________ > > Moses-support mailing list > > [email protected] <mailto:[email protected]> > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
