hmm, strange. if you can send me the model, i'll look into it.

to get the categories in each cell, uncomment line
ChartManager.cpp line 104
feel free to make it into a verbose flag option if you wish

On 23/06/2011 09:55, Dennis Mehay wrote:
> Hi Hieu,
>
> with ttl's = 100 and 0
> --------------------------------------------------------
> Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1)
> [1,1]=X (1) [1,2]=X (1) [2,2]=X (1)
>
> Num of hypo = 4 --- cells:
>
> 0 1 2
> 1 3 0
> 0 0
> 0
> NO BEST TRANSLATION
> --------------------------------------------------------
>
> and with ttl's 100 and 100000000
> --------------------------------------------------------
> Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1)
> [1,1]=X (1) [1,2]=X (1) [2,2]=X (1)
>
> Num of hypo = 4 --- cells:
> 0 1 2
> 1 3 0
> 0 0
> 0
> NO BEST TRANSLATION
> --------------------------------------------------------
>
> This is from a fresh svn checkout that I compiled just before running.
> The glue rules seem to be failing when trying to combine the chart
> cells that cover "<s> 说了算".
>
> My glue grammar has 4666 entries in it, for what it's worth. I can
> send it to you if you want, but it might be too big to put up here on
> the forum.
>
> Is there a quick-and-dirty way to see what categories are inserted
> into which cells when (some verbosity setting, perhaps)?
>
> > I corrected this behaviour recently
> >
> http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004
> <http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004>
>
> Ah, yes. I named the binary ...19-june-2011, but I had copied it from
> a previous svn checkout, sorry. These things are still happening on
> the latest checkout, though.
>
> --D.N.
>
> 2011/6/22 Hieu Hoang <[email protected] <mailto:[email protected]>>
>
>     hi dennis
>
>     You're right, it should be working. The entries in the glue rules
>     might be pruned. Can you try to change the [table-limit] in the
>     ini file to
>     [ttable-limit]
>     100
>     10000000
>     or
>     [ttable-limit]
>     100
>     0
>
>     Each row correspond to the table pruning limit for each table. If
>     you provide only 1 entry, then it prune every table uniformly.
>     StaticData.cpp (line 894)
>     For a grammar with lots of non-terminals like yours, the table
>     limit may be cutting off the some of the entries in the glue rule
>     table
>
>     Also, the decoder shouldn't be processing <s> and </s> as unknown
>     words, they should only be translated by the glue rules. This is
>     the reason you get 1587 translations of </s>.
>
>     I corrected this behaviour recently
>     
> http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004
>     
> <http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004>
>
>
>
>     On 23/06/2011 06:14, Dennis Mehay wrote:
>>     Just in case it confuses anyone, both commands (below) were run
>>     in the same way, I just simplified it for expository purposes to
>>     " moses_chart -f moses.ini -cube-pruning-pop-limit 2000" in the
>>     first case, but not in the second.
>>
>>     --D.N.
>>
>>     2011/6/22 Dennis Mehay <[email protected] <mailto:[email protected]>>
>>
>>         Hi Philipp,
>>
>>         Thanks for the reply. I tracked some of the cases down to a
>>         *known* word (or whitespace-tokenized thingie, anyway -- I
>>         don't know much of what constitutes a word in written
>>         Chinese) by doing the following:
>>
>>         
>> ----------------------------------------------------------------------
>>         $ echo "说了算" | moses_chart -f moses.ini
>>         -cube-pruning-pop-limit 2000
>>
>>         Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1)
>>         [0,2]=X (1) [1,1]=X (1) [1,2]=X (1) [2,2]=X (1)
>>
>>         Num of hypo = 1591 --- cells:
>>         0 1 2
>>         1 3 1587
>>         0 0
>>         0
>>         NO BEST TRANSLATION
>>         
>> ----------------------------------------------------------------------
>>
>>         (An aside: 1587 is the number of categories in the unknown
>>         word list. Why does the last token, viz., "</s>", get that
>>         many cells? )
>>
>>         Anyhow, sure enough, there are three entries for the middle
>>         token "说了算"
>>
>>         
>> ----------------------------------------------------------------------
>>         $ zless rule-table.gz
>>         ...
>>         说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] |||
>>         0.000113126 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3
>>         说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] |||
>>         0.000309866 6.94e-05 0.00475133 0.00028945 2.718 ||| ||| 46 3
>>         说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] |||
>>         0.000208847 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3
>>         ...
>>         
>> ----------------------------------------------------------------------
>>
>>         There are entries in the glue table for these three
>>         categories -- ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP),
>>         (S\NP[expl])/(S[to]\NP) and (S\NP[expl])/(S[b]\NP) --- so we
>>         should be able to hack together a translation using any of them.
>>
>>         
>> ----------------------------------------------------------------------
>>         <s> [X] ||| <s> [Q] ||| 1 |||
>>         ...
>>         [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] |||
>>         [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] |||
>>         2.718 ||| 0-0 1-1
>>         ...
>>         [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q]
>>         [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>>         ...
>>         [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q]
>>         [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>>         ...
>>         
>> ----------------------------------------------------------------------
>>
>>         And just to be sure that it isn't an unknown word problem,
>>         let's mangle the token "说了算" by deleting the last
>>         character and see what happens:
>>
>>         
>> ----------------------------------------------------------------------
>>         $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f
>>         dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000
>>         Translating: <s> 说了 </s> ||| [0,0]=X (1) [0,1]=X (1)
>>         [0,2]=X (1) [1,1]=X (1) [1,2]=X (1) [2,2]=X (1)
>>
>>         Num of hypo = 6396 --- cells:
>>         0 1 2
>>         1 1587 1587
>>         1 0
>>         1
>>         BEST TRANSLATION: 4763 Q </s> :0-0 : pC=0.000, c=-1.002
>>         [0..2] 3176 [total=-22.789] <<-1.303, -1.940, -46.302, 0.000,
>>         0.000, 0.000, 0.000, 0.000, 1.000>>
>>         说了
>>         
>> ----------------------------------------------------------------------
>>
>>         The best "translation" is just a pass-through, as expected
>>         (and there are 1587 nodes for that unknown token -- just as
>>         many as there are unknown word lhs's in the unknown-lhs file).
>>
>>         Strange. Very strange. Or am I missing the obvious?
>>
>>         I'm at a loss here. Does anyone have any guesses as to what's
>>         going on here?
>>
>>         --D.N.
>>
>>
>>         2011/6/22 Philipp Koehn <[email protected]
>>         <mailto:[email protected]>>
>>
>>             Hi,
>>
>>             there always should be a rule to combine a span to the left.
>>
>>             Check what labels are chosen for the 13th word, and why there
>>             are no glue rules for it.
>>
>>             If I would hazard a guess, I would suspect that this is an
>>             unknown word and a file with the likely labels for
>>             unknown words
>>             is used, but these do not match the glue grammar.
>>
>>             -phi
>>
>>             2011/6/22 Dennis Mehay <[email protected]
>>             <mailto:[email protected]>>:
>>             > Hi all,
>>             >
>>             > I posted this, but it bounced. My attachments were too
>>             big. I'm resending
>>             > without the larger attachment. Apologies for any
>>             duplicate posting.
>>             >
>>             > I'm running moses_chart to do some syntax-based MT
>>             experiments, and, during
>>             > tuning, I'm coming across some instances where the
>>             decoder can't produce a
>>             > translation (btw 32 and 38 in a 500 sentence tuning
>>             set). This should not
>>             > be happening, so far as I can tell, since I have a glue
>>             grammar (where all
>>             > the nonterminals of the training set plus the [Q]
>>             nonterminal are accounted
>>             > for), and an 'unknown-lhs' list with the relative
>>             frequencies of all the
>>             > categories as they span only a single word in the
>>             training set (i.e., the
>>             > frequency of each category's spanning a single word in
>>             the rule table / the
>>             > total number of single-word instances in the rule table).
>>             >
>>             > Here is an example of a sentence that there was no
>>             translation for:
>>             >
>>             > ------------------------------
>>             > ---------------------------------------------------------
>>             > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有
>>             权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
>>             > ...
>>             > Decoding:
>>             > Num of hypo = 84813 --- cells:
>>             > 0 1 2 3 4 5 6 7 8 9 10
>>             
>> <tel:1%20%C2%A0%202%20%C2%A0%203%20%C2%A0%204%20%C2%A0%205%20%C2%A0%206%20%C2%A0%207%20%C2%A0%208%20%C2%A0%209%20%C2%A010>
>>             11 12 13 14 15 16 17 18
>>             > 19 20 21
>>             > 1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100
>>             > 85 3 14 1000
>>             > 40 960 278 717 916 857 976 276 396 952 958 150 0 0 919
>>             74 402 802
>>             > 0 0 12
>>             > 200 975 908 849 850 858 968 971 971 862 974 0 0 0 852
>>             865 984
>>             > 0 0 0
>>             > 200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984 0
>>             > 0 0
>>             > 200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0
>>             > 0 0
>>             > 200 828 910 801 838 796 722 870 0 0 0 0 0 0 0 0
>>             > 0
>>             > 200 799 914 832 801 745 926 0 0 0 0 0 0 0 0 0
>>             > 200 756 819 901 693 692 0 0 0 0 0 0 0 0 0
>>             > 200 716 680 665 437 0 0 0 0 0 0 0 0 0
>>             > 200 683 527 929 0 0 0 0 0 0 0 0 0
>>             > 200 532 588 0 0 0 0 0 0 0 0 0
>>             > 200 580 0 0 0 0 0 0 0 0 0
>>             > 200 0 0 0 0 0 0 0 0 0
>>             > 0 0 0 0 0 0 0 0 0
>>             > 0 0 0 0 0 0 0 0
>>             > 0 0 0 0 0 0 0
>>             > 0 0 0 0 0 0
>>             > 0 0 0 0 0
>>             > 0 0 0 0
>>             > 0 0 0
>>             > 0 0
>>             > 0
>>             > NO BEST TRANSLATION
>>             >
>>             > Translation took 4.340 seconds
>>             >
>>             
>> ---------------------------------------------------------------------------------------
>>             >
>>             > The ASCII-art chart's alignment may be a bit off, but,
>>             just eye-balling it,
>>             > it looks as if the 19th word (index 18) has a chart
>>             entry count above it,
>>             > but then this entry does not get combined with what's
>>             to the left using the
>>             > glue rules.
>>             >
>>             > Could this be a pruning or cutoff issue (i.e., stack size,
>>             > cube-pruning-pop-limit, maximum number of rules per
>>             span, etc.)? Or maybe
>>             > it has to do with the fact that my unknown-lhs file has
>>             *all* categories
>>             > that spanned a single word in the training set. Maybe I
>>             should prune it to
>>             > the top 10 or 20, or so. I'm really at a loss here. I
>>             thought the glue
>>             > grammar would make the decoder always return an answer,
>>             no matter how awful.
>>             >
>>             > Any insight?
>>             >
>>             > I have attached my moses.ini file in case anyone wants
>>             to have a look. I
>>             > can also send the glue rule file later, but, as I said,
>>             it seems to account
>>             > for all of the training set's categories (and it was
>>             produced automatically
>>             > using the -glue-grammar option).
>>             >
>>             > Best,
>>             > Dennis
>>             > _______________________________________________
>>             > Moses-support mailing list
>>             > [email protected] <mailto:[email protected]>
>>             > http://mailman.mit.edu/mailman/listinfo/moses-support
>>             >
>>             >
>>
>>
>>
>>
>>     _______________________________________________
>>     Moses-support mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://mailman.mit.edu/mailman/listinfo/moses-support
>
>     _______________________________________________
>     Moses-support mailing list
>     [email protected] <mailto:[email protected]>
>     http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to