hmm, strange. if you can send me the model, i'll look into it. to get the categories in each cell, uncomment line ChartManager.cpp line 104 feel free to make it into a verbose flag option if you wish
On 23/06/2011 09:55, Dennis Mehay wrote: > Hi Hieu, > > with ttl's = 100 and 0 > -------------------------------------------------------- > Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) > [1,1]=X (1) [1,2]=X (1) [2,2]=X (1) > > Num of hypo = 4 --- cells: > > 0 1 2 > 1 3 0 > 0 0 > 0 > NO BEST TRANSLATION > -------------------------------------------------------- > > and with ttl's 100 and 100000000 > -------------------------------------------------------- > Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) > [1,1]=X (1) [1,2]=X (1) [2,2]=X (1) > > Num of hypo = 4 --- cells: > 0 1 2 > 1 3 0 > 0 0 > 0 > NO BEST TRANSLATION > -------------------------------------------------------- > > This is from a fresh svn checkout that I compiled just before running. > The glue rules seem to be failing when trying to combine the chart > cells that cover "<s> 说了算". > > My glue grammar has 4666 entries in it, for what it's worth. I can > send it to you if you want, but it might be too big to put up here on > the forum. > > Is there a quick-and-dirty way to see what categories are inserted > into which cells when (some verbosity setting, perhaps)? > > > I corrected this behaviour recently > > > http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004 > <http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004> > > Ah, yes. I named the binary ...19-june-2011, but I had copied it from > a previous svn checkout, sorry. These things are still happening on > the latest checkout, though. > > --D.N. > > 2011/6/22 Hieu Hoang <[email protected] <mailto:[email protected]>> > > hi dennis > > You're right, it should be working. The entries in the glue rules > might be pruned. Can you try to change the [table-limit] in the > ini file to > [ttable-limit] > 100 > 10000000 > or > [ttable-limit] > 100 > 0 > > Each row correspond to the table pruning limit for each table. If > you provide only 1 entry, then it prune every table uniformly. > StaticData.cpp (line 894) > For a grammar with lots of non-terminals like yours, the table > limit may be cutting off the some of the entries in the glue rule > table > > Also, the decoder shouldn't be processing <s> and </s> as unknown > words, they should only be translated by the glue rules. This is > the reason you get 1587 translations of </s>. > > I corrected this behaviour recently > > http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004 > > <http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004> > > > > On 23/06/2011 06:14, Dennis Mehay wrote: >> Just in case it confuses anyone, both commands (below) were run >> in the same way, I just simplified it for expository purposes to >> " moses_chart -f moses.ini -cube-pruning-pop-limit 2000" in the >> first case, but not in the second. >> >> --D.N. >> >> 2011/6/22 Dennis Mehay <[email protected] <mailto:[email protected]>> >> >> Hi Philipp, >> >> Thanks for the reply. I tracked some of the cases down to a >> *known* word (or whitespace-tokenized thingie, anyway -- I >> don't know much of what constitutes a word in written >> Chinese) by doing the following: >> >> >> ---------------------------------------------------------------------- >> $ echo "说了算" | moses_chart -f moses.ini >> -cube-pruning-pop-limit 2000 >> >> Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) >> [0,2]=X (1) [1,1]=X (1) [1,2]=X (1) [2,2]=X (1) >> >> Num of hypo = 1591 --- cells: >> 0 1 2 >> 1 3 1587 >> 0 0 >> 0 >> NO BEST TRANSLATION >> >> ---------------------------------------------------------------------- >> >> (An aside: 1587 is the number of categories in the unknown >> word list. Why does the last token, viz., "</s>", get that >> many cells? ) >> >> Anyhow, sure enough, there are three entries for the middle >> token "说了算" >> >> >> ---------------------------------------------------------------------- >> $ zless rule-table.gz >> ... >> 说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] ||| >> 0.000113126 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3 >> 说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] ||| >> 0.000309866 6.94e-05 0.00475133 0.00028945 2.718 ||| ||| 46 3 >> 说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] ||| >> 0.000208847 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3 >> ... >> >> ---------------------------------------------------------------------- >> >> There are entries in the glue table for these three >> categories -- ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP), >> (S\NP[expl])/(S[to]\NP) and (S\NP[expl])/(S[b]\NP) --- so we >> should be able to hack together a translation using any of them. >> >> >> ---------------------------------------------------------------------- >> <s> [X] ||| <s> [Q] ||| 1 ||| >> ... >> [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] ||| >> [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] ||| >> 2.718 ||| 0-0 1-1 >> ... >> [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q] >> [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 >> ... >> [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q] >> [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1 >> ... >> >> ---------------------------------------------------------------------- >> >> And just to be sure that it isn't an unknown word problem, >> let's mangle the token "说了算" by deleting the last >> character and see what happens: >> >> >> ---------------------------------------------------------------------- >> $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f >> dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000 >> Translating: <s> 说了 </s> ||| [0,0]=X (1) [0,1]=X (1) >> [0,2]=X (1) [1,1]=X (1) [1,2]=X (1) [2,2]=X (1) >> >> Num of hypo = 6396 --- cells: >> 0 1 2 >> 1 1587 1587 >> 1 0 >> 1 >> BEST TRANSLATION: 4763 Q </s> :0-0 : pC=0.000, c=-1.002 >> [0..2] 3176 [total=-22.789] <<-1.303, -1.940, -46.302, 0.000, >> 0.000, 0.000, 0.000, 0.000, 1.000>> >> 说了 >> >> ---------------------------------------------------------------------- >> >> The best "translation" is just a pass-through, as expected >> (and there are 1587 nodes for that unknown token -- just as >> many as there are unknown word lhs's in the unknown-lhs file). >> >> Strange. Very strange. Or am I missing the obvious? >> >> I'm at a loss here. Does anyone have any guesses as to what's >> going on here? >> >> --D.N. >> >> >> 2011/6/22 Philipp Koehn <[email protected] >> <mailto:[email protected]>> >> >> Hi, >> >> there always should be a rule to combine a span to the left. >> >> Check what labels are chosen for the 13th word, and why there >> are no glue rules for it. >> >> If I would hazard a guess, I would suspect that this is an >> unknown word and a file with the likely labels for >> unknown words >> is used, but these do not match the glue grammar. >> >> -phi >> >> 2011/6/22 Dennis Mehay <[email protected] >> <mailto:[email protected]>>: >> > Hi all, >> > >> > I posted this, but it bounced. My attachments were too >> big. I'm resending >> > without the larger attachment. Apologies for any >> duplicate posting. >> > >> > I'm running moses_chart to do some syntax-based MT >> experiments, and, during >> > tuning, I'm coming across some instances where the >> decoder can't produce a >> > translation (btw 32 and 38 in a 500 sentence tuning >> set). This should not >> > be happening, so far as I can tell, since I have a glue >> grammar (where all >> > the nonterminals of the training set plus the [Q] >> nonterminal are accounted >> > for), and an 'unknown-lhs' list with the relative >> frequencies of all the >> > categories as they span only a single word in the >> training set (i.e., the >> > frequency of each category's spanning a single word in >> the rule table / the >> > total number of single-word instances in the rule table). >> > >> > Here is an example of a sentence that there was no >> translation for: >> > >> > ------------------------------ >> > --------------------------------------------------------- >> > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 >> 权 谁 说了算 , 谁 官 大 谁 说了算 . </s> >> > ... >> > Decoding: >> > Num of hypo = 84813 --- cells: >> > 0 1 2 3 4 5 6 7 8 9 10 >> >> <tel:1%20%C2%A0%202%20%C2%A0%203%20%C2%A0%204%20%C2%A0%205%20%C2%A0%206%20%C2%A0%207%20%C2%A0%208%20%C2%A0%209%20%C2%A010> >> 11 12 13 14 15 16 17 18 >> > 19 20 21 >> > 1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100 >> > 85 3 14 1000 >> > 40 960 278 717 916 857 976 276 396 952 958 150 0 0 919 >> 74 402 802 >> > 0 0 12 >> > 200 975 908 849 850 858 968 971 971 862 974 0 0 0 852 >> 865 984 >> > 0 0 0 >> > 200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984 0 >> > 0 0 >> > 200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0 >> > 0 0 >> > 200 828 910 801 838 796 722 870 0 0 0 0 0 0 0 0 >> > 0 >> > 200 799 914 832 801 745 926 0 0 0 0 0 0 0 0 0 >> > 200 756 819 901 693 692 0 0 0 0 0 0 0 0 0 >> > 200 716 680 665 437 0 0 0 0 0 0 0 0 0 >> > 200 683 527 929 0 0 0 0 0 0 0 0 0 >> > 200 532 588 0 0 0 0 0 0 0 0 0 >> > 200 580 0 0 0 0 0 0 0 0 0 >> > 200 0 0 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 0 >> > 0 0 0 0 0 0 >> > 0 0 0 0 0 >> > 0 0 0 0 >> > 0 0 0 >> > 0 0 >> > 0 >> > NO BEST TRANSLATION >> > >> > Translation took 4.340 seconds >> > >> >> --------------------------------------------------------------------------------------- >> > >> > The ASCII-art chart's alignment may be a bit off, but, >> just eye-balling it, >> > it looks as if the 19th word (index 18) has a chart >> entry count above it, >> > but then this entry does not get combined with what's >> to the left using the >> > glue rules. >> > >> > Could this be a pruning or cutoff issue (i.e., stack size, >> > cube-pruning-pop-limit, maximum number of rules per >> span, etc.)? Or maybe >> > it has to do with the fact that my unknown-lhs file has >> *all* categories >> > that spanned a single word in the training set. Maybe I >> should prune it to >> > the top 10 or 20, or so. I'm really at a loss here. I >> thought the glue >> > grammar would make the decoder always return an answer, >> no matter how awful. >> > >> > Any insight? >> > >> > I have attached my moses.ini file in case anyone wants >> to have a look. I >> > can also send the glue rule file later, but, as I said, >> it seems to account >> > for all of the training set's categories (and it was >> produced automatically >> > using the -glue-grammar option). >> > >> > Best, >> > Dennis >> > _______________________________________________ >> > Moses-support mailing list >> > [email protected] <mailto:[email protected]> >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] <mailto:[email protected]> >> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > [email protected] <mailto:[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
