Re: [Moses-support] NO BEST TRANSLATION in moses_chart (glue grammar failure?)

Dennis Mehay Wed, 22 Jun 2011 19:56:52 -0700

Hi Hieu,

with ttl's = 100 and 0
--------------------------------------------------------
Translating: <s> 说了算 </s>  ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
(1) [1,2]=X (1) [2,2]=X (1)


Num of hypo = 4 --- cells:

  0   1   2
  1   3   0
    0   0
      0
NO BEST TRANSLATION
--------------------------------------------------------

and with ttl's 100 and 100000000
--------------------------------------------------------
Translating: <s> 说了算 </s>  ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
(1) [1,2]=X (1) [2,2]=X (1)

Num of hypo = 4 --- cells:
  0   1   2
  1   3   0
    0   0
      0
NO BEST TRANSLATION
--------------------------------------------------------

This is from a fresh svn checkout that I compiled just before running.  The
glue rules seem to be failing when trying to combine the chart cells that
cover "<s> 说了算".

My glue grammar has 4666 entries in it, for what it's worth.  I can send it
to you if you want, but it might be too big to put up here on the forum.

Is there a quick-and-dirty way to see what categories are inserted into
which cells when (some verbosity setting, perhaps)?

> I corrected this behaviour recently
>
http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004

Ah, yes.  I named the binary ...19-june-2011, but I had copied it from a
previous svn checkout, sorry.  These things are still happening on the
latest checkout, though.

--D.N.

2011/6/22 Hieu Hoang <[email protected]>

> **
> hi dennis
>
> You're right, it should be working. The entries in the glue rules might be
> pruned. Can you try to change the [table-limit] in the ini file to
>    [ttable-limit]
>    100
>    10000000
> or
>    [ttable-limit]
>    100
>    0
>
> Each row correspond to the table pruning limit for each table. If you
> provide only 1 entry, then it prune every table uniformly.
>    StaticData.cpp (line 894)
> For a grammar with lots of non-terminals like yours, the table limit may be
> cutting off the some of the entries in the glue rule table
>
> Also, the decoder shouldn't be processing <s> and </s> as unknown words,
> they should only be translated by the glue rules. This is the reason you get
> 1587 translations of </s>.
>
> I corrected this behaviour recently
>
> http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004
>
>
> On 23/06/2011 06:14, Dennis Mehay wrote:
>
> Just in case it confuses anyone, both commands (below) were run in the same
> way, I just simplified it for expository purposes to " moses_chart -f
> moses.ini -cube-pruning-pop-limit 2000" in the first case, but not in the
> second.
>
> --D.N.
>
> 2011/6/22 Dennis Mehay <[email protected]>
>
>> Hi Philipp,
>>
>> Thanks for the reply.  I tracked some of the cases down to a *known* word
>> (or whitespace-tokenized thingie, anyway -- I don't know much of what
>> constitutes a word in written Chinese) by doing the following:
>>
>> ----------------------------------------------------------------------
>> $ echo "说了算" |  moses_chart -f moses.ini -cube-pruning-pop-limit 2000
>>
>> Translating: <s> 说了算 </s>  ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
>> (1) [1,2]=X (1) [2,2]=X (1)
>>
>> Num of hypo = 1591 --- cells:
>>   0   1   2
>>   1   3 1587
>>     0   0
>>       0
>> NO BEST TRANSLATION
>>  ----------------------------------------------------------------------
>>
>> (An aside: 1587 is the number of categories in the unknown word list.  Why
>> does the last token, viz., "</s>", get that many cells? )
>>
>> Anyhow, sure enough, there are three entries for the middle token "说了算"
>>
>> ----------------------------------------------------------------------
>> $ zless rule-table.gz
>> ...
>> 说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] ||| 0.000113126
>> 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3
>> 说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] ||| 0.000309866
>> 6.94e-05 0.00475133 0.00028945 2.718 ||| ||| 46 3
>> 说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] ||| 0.000208847
>> 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3
>> ...
>> ----------------------------------------------------------------------
>>
>> There are entries in the glue table for these three categories --
>> ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP), (S\NP[expl])/(S[to]\NP) and
>> (S\NP[expl])/(S[b]\NP) --- so we should be able to hack together a
>> translation using any of them.
>>
>> ----------------------------------------------------------------------
>> <s> [X] ||| <s> [Q] ||| 1  |||
>> ...
>> [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] ||| [X][Q]
>> [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>> ...
>> [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q]
>> [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>> ...
>> [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q]
>> [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>> ...
>> ----------------------------------------------------------------------
>>
>> And just to be sure that it isn't an unknown word problem, let's mangle
>> the token "说了算" by deleting the last character and see what happens:
>>
>> ----------------------------------------------------------------------
>> $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f
>> dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000
>> Translating: <s> 说了 </s>  ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
>> (1) [1,2]=X (1) [2,2]=X (1)
>>
>> Num of hypo = 6396 --- cells:
>>   0   1   2
>>   1 1587 1587
>>     1   0
>>       1
>> BEST TRANSLATION:  4763 Q </s> :0-0 : pC=0.000, c=-1.002 [0..2] 3176
>> [total=-22.789] <<-1.303, -1.940, -46.302, 0.000, 0.000, 0.000, 0.000,
>> 0.000, 1.000>>
>> 说了
>> ----------------------------------------------------------------------
>>
>> The best "translation" is just a pass-through, as expected (and there are
>> 1587 nodes for that unknown token -- just as many as there are unknown word
>> lhs's in the unknown-lhs file).
>>
>> Strange. Very strange.  Or am I missing the obvious?
>>
>> I'm at a loss here.  Does anyone have any guesses as to what's going on
>> here?
>>
>> --D.N.
>>
>>
>>  2011/6/22 Philipp Koehn <[email protected]>
>>
>>> Hi,
>>>
>>> there always should be a rule to combine a span to the left.
>>>
>>> Check what labels are chosen for the 13th word, and why there
>>> are no glue rules for it.
>>>
>>> If I would hazard a guess, I would suspect that this is an
>>> unknown word and a file with the likely labels for unknown words
>>> is used, but these do not match the glue grammar.
>>>
>>> -phi
>>>
>>> 2011/6/22 Dennis Mehay <[email protected]>:
>>>  > Hi all,
>>> >
>>> > I posted this, but it bounced.  My attachments were too big.  I'm
>>> resending
>>> > without the larger attachment.  Apologies for any duplicate posting.
>>> >
>>> > I'm running moses_chart to do some syntax-based MT experiments, and,
>>> during
>>> > tuning, I'm coming across some instances where the decoder can't
>>> produce a
>>> > translation (btw 32 and 38 in a 500 sentence tuning set).  This should
>>> not
>>> > be happening, so far as I can tell, since I have a glue grammar (where
>>> all
>>> > the nonterminals of the training set plus the [Q] nonterminal are
>>> accounted
>>> > for), and an 'unknown-lhs' list with the relative frequencies of all
>>> the
>>> > categories as they span only a single word in the training set (i.e.,
>>> the
>>> > frequency of each category's spanning a single word in the rule table /
>>> the
>>> > total number of single-word instances in the rule table).
>>> >
>>> > Here is an example of a sentence that there was no translation for:
>>> >
>>> > ------------------------------
>>> > ---------------------------------------------------------
>>> > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
>>> > ...
>>> > Decoding:
>>> > Num of hypo = 84813 --- cells:
>>> >   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
>>>  17  18
>>> > 19  20  21
>>> >   1 100  77  93  83  99  99 100 100  85  99  43  85   3  99  85  18 100
>>> > 85   3  14 1000
>>> >    40 960 278 717 916 857 976 276 396 952 958 150   0   0 919  74 402
>>> 802
>>> > 0   0  12
>>> >     200 975 908 849 850 858 968 971 971 862 974   0   0   0 852 865 984
>>> > 0   0   0
>>> >       200 940 849 889 763 715 990 962 979 905   0   0   0   0 864 984
>>> 0
>>> > 0   0
>>> >         200 868 939 886 863 803 887 861 981   0   0   0   0   0 871   0
>>> > 0   0
>>> >           200 828 910 801 838 796 722 870   0   0   0   0   0   0   0
>>> 0
>>> > 0
>>> >             200 799 914 832 801 745 926   0   0   0   0   0   0   0   0
>>>   0
>>> >               200 756 819 901 693 692   0   0   0   0   0   0   0   0
>>> 0
>>> >                 200 716 680 665 437   0   0   0   0   0   0   0   0   0
>>> >                   200 683 527 929   0   0   0   0   0   0   0   0   0
>>> >                     200 532 588   0   0   0   0   0   0   0   0   0
>>> >                       200 580   0   0   0   0   0   0   0   0   0
>>> >                         200   0   0   0   0   0   0   0   0   0
>>> >                             0   0   0   0   0   0   0   0   0
>>> >                               0   0   0   0   0   0   0   0
>>> >                                 0   0   0   0   0   0   0
>>> >                                   0   0   0   0   0   0
>>> >                                     0   0   0   0   0
>>> >                                       0   0   0   0
>>> >                                         0   0   0
>>> >                                           0   0
>>> >                                             0
>>> > NO BEST TRANSLATION
>>> >
>>> > Translation took 4.340 seconds
>>> >
>>> ---------------------------------------------------------------------------------------
>>> >
>>> > The ASCII-art chart's alignment may be a bit off, but, just eye-balling
>>> it,
>>> > it looks as if the 19th word (index 18) has a chart entry count above
>>> it,
>>> > but then this entry does not get combined with what's to the left using
>>> the
>>> > glue rules.
>>> >
>>> > Could this be a pruning or cutoff issue (i.e., stack size,
>>> > cube-pruning-pop-limit, maximum number of rules per span, etc.)?  Or
>>> maybe
>>> > it has to do with the fact that my unknown-lhs file has *all*
>>> categories
>>> > that spanned a single word in the training set.  Maybe I should prune
>>> it to
>>> > the top 10 or 20, or so.  I'm really at a loss here.  I thought the
>>> glue
>>> > grammar would make the decoder always return an answer, no matter how
>>> awful.
>>> >
>>> > Any insight?
>>> >
>>> > I have attached my moses.ini file in case anyone wants to have a look.
>>>  I
>>> > can also send the glue rule file later, but, as I said, it seems to
>>> account
>>> > for all of the training set's categories (and it was produced
>>> automatically
>>> > using the -glue-grammar option).
>>> >
>>> > Best,
>>> > Dennis
>>>  > _______________________________________________
>>> > Moses-support mailing list
>>> > [email protected]
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> >
>>>
>>
>>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] NO BEST TRANSLATION in moses_chart (glue grammar failure?)

Reply via email to