Just in case it confuses anyone, both commands (below) were run in the same
way, I just simplified it for expository purposes to " moses_chart -f
moses.ini -cube-pruning-pop-limit 2000" in the first case, but not in the
second.

--D.N.

2011/6/22 Dennis Mehay <[email protected]>

> Hi Philipp,
>
> Thanks for the reply.  I tracked some of the cases down to a *known* word
> (or whitespace-tokenized thingie, anyway -- I don't know much of what
> constitutes a word in written Chinese) by doing the following:
>
> ----------------------------------------------------------------------
> $ echo "说了算" |  moses_chart -f moses.ini -cube-pruning-pop-limit 2000
>
> Translating: <s> 说了算 </s>  ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
> (1) [1,2]=X (1) [2,2]=X (1)
>
> Num of hypo = 1591 --- cells:
>   0   1   2
>   1   3 1587
>     0   0
>       0
> NO BEST TRANSLATION
> ----------------------------------------------------------------------
>
> (An aside: 1587 is the number of categories in the unknown word list.  Why
> does the last token, viz., "</s>", get that many cells? )
>
> Anyhow, sure enough, there are three entries for the middle token "说了算"
>
> ----------------------------------------------------------------------
> $ zless rule-table.gz
> ...
> 说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] ||| 0.000113126
> 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3
> 说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] ||| 0.000309866 6.94e-05
> 0.00475133 0.00028945 2.718 ||| ||| 46 3
> 说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] ||| 0.000208847
> 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3
> ...
> ----------------------------------------------------------------------
>
> There are entries in the glue table for these three categories --
> ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP), (S\NP[expl])/(S[to]\NP) and
> (S\NP[expl])/(S[b]\NP) --- so we should be able to hack together a
> translation using any of them.
>
> ----------------------------------------------------------------------
> <s> [X] ||| <s> [Q] ||| 1  |||
> ...
> [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] ||| [X][Q]
> [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
> ...
> [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q]
> [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
> ...
> [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q]
> [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
> ...
> ----------------------------------------------------------------------
>
> And just to be sure that it isn't an unknown word problem, let's mangle the
> token "说了算" by deleting the last character and see what happens:
>
> ----------------------------------------------------------------------
> $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f
> dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000
> Translating: <s> 说了 </s>  ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
> (1) [1,2]=X (1) [2,2]=X (1)
>
> Num of hypo = 6396 --- cells:
>   0   1   2
>   1 1587 1587
>     1   0
>       1
> BEST TRANSLATION:  4763 Q </s> :0-0 : pC=0.000, c=-1.002 [0..2] 3176
> [total=-22.789] <<-1.303, -1.940, -46.302, 0.000, 0.000, 0.000, 0.000,
> 0.000, 1.000>>
> 说了
> ----------------------------------------------------------------------
>
> The best "translation" is just a pass-through, as expected (and there are
> 1587 nodes for that unknown token -- just as many as there are unknown word
> lhs's in the unknown-lhs file).
>
> Strange. Very strange.  Or am I missing the obvious?
>
> I'm at a loss here.  Does anyone have any guesses as to what's going on
> here?
>
> --D.N.
>
>
> 2011/6/22 Philipp Koehn <[email protected]>
>
>> Hi,
>>
>> there always should be a rule to combine a span to the left.
>>
>> Check what labels are chosen for the 13th word, and why there
>> are no glue rules for it.
>>
>> If I would hazard a guess, I would suspect that this is an
>> unknown word and a file with the likely labels for unknown words
>> is used, but these do not match the glue grammar.
>>
>> -phi
>>
>> 2011/6/22 Dennis Mehay <[email protected]>:
>> > Hi all,
>> >
>> > I posted this, but it bounced.  My attachments were too big.  I'm
>> resending
>> > without the larger attachment.  Apologies for any duplicate posting.
>> >
>> > I'm running moses_chart to do some syntax-based MT experiments, and,
>> during
>> > tuning, I'm coming across some instances where the decoder can't produce
>> a
>> > translation (btw 32 and 38 in a 500 sentence tuning set).  This should
>> not
>> > be happening, so far as I can tell, since I have a glue grammar (where
>> all
>> > the nonterminals of the training set plus the [Q] nonterminal are
>> accounted
>> > for), and an 'unknown-lhs' list with the relative frequencies of all the
>> > categories as they span only a single word in the training set (i.e.,
>> the
>> > frequency of each category's spanning a single word in the rule table /
>> the
>> > total number of single-word instances in the rule table).
>> >
>> > Here is an example of a sentence that there was no translation for:
>> >
>> > ------------------------------
>> > ---------------------------------------------------------
>> > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
>> > ...
>> > Decoding:
>> > Num of hypo = 84813 --- cells:
>> >   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
>>  18
>> > 19  20  21
>> >   1 100  77  93  83  99  99 100 100  85  99  43  85   3  99  85  18 100
>> > 85   3  14 1000
>> >    40 960 278 717 916 857 976 276 396 952 958 150   0   0 919  74 402
>> 802
>> > 0   0  12
>> >     200 975 908 849 850 858 968 971 971 862 974   0   0   0 852 865 984
>> > 0   0   0
>> >       200 940 849 889 763 715 990 962 979 905   0   0   0   0 864 984
>> 0
>> > 0   0
>> >         200 868 939 886 863 803 887 861 981   0   0   0   0   0 871   0
>> > 0   0
>> >           200 828 910 801 838 796 722 870   0   0   0   0   0   0   0
>> 0
>> > 0
>> >             200 799 914 832 801 745 926   0   0   0   0   0   0   0   0
>>   0
>> >               200 756 819 901 693 692   0   0   0   0   0   0   0   0
>> 0
>> >                 200 716 680 665 437   0   0   0   0   0   0   0   0   0
>> >                   200 683 527 929   0   0   0   0   0   0   0   0   0
>> >                     200 532 588   0   0   0   0   0   0   0   0   0
>> >                       200 580   0   0   0   0   0   0   0   0   0
>> >                         200   0   0   0   0   0   0   0   0   0
>> >                             0   0   0   0   0   0   0   0   0
>> >                               0   0   0   0   0   0   0   0
>> >                                 0   0   0   0   0   0   0
>> >                                   0   0   0   0   0   0
>> >                                     0   0   0   0   0
>> >                                       0   0   0   0
>> >                                         0   0   0
>> >                                           0   0
>> >                                             0
>> > NO BEST TRANSLATION
>> >
>> > Translation took 4.340 seconds
>> >
>> ---------------------------------------------------------------------------------------
>> >
>> > The ASCII-art chart's alignment may be a bit off, but, just eye-balling
>> it,
>> > it looks as if the 19th word (index 18) has a chart entry count above
>> it,
>> > but then this entry does not get combined with what's to the left using
>> the
>> > glue rules.
>> >
>> > Could this be a pruning or cutoff issue (i.e., stack size,
>> > cube-pruning-pop-limit, maximum number of rules per span, etc.)?  Or
>> maybe
>> > it has to do with the fact that my unknown-lhs file has *all* categories
>> > that spanned a single word in the training set.  Maybe I should prune it
>> to
>> > the top 10 or 20, or so.  I'm really at a loss here.  I thought the glue
>> > grammar would make the decoder always return an answer, no matter how
>> awful.
>> >
>> > Any insight?
>> >
>> > I have attached my moses.ini file in case anyone wants to have a look.
>>  I
>> > can also send the glue rule file later, but, as I said, it seems to
>> account
>> > for all of the training set's categories (and it was produced
>> automatically
>> > using the -glue-grammar option).
>> >
>> > Best,
>> > Dennis
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to