Hi Hieu,
with ttl's = 100 and 0
--------------------------------------------------------
Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
(1) [1,2]=X (1) [2,2]=X (1)
Num of hypo = 4 --- cells:
0 1 2
1 3 0
0 0
0
NO BEST TRANSLATION
--------------------------------------------------------
and with ttl's 100 and 100000000
--------------------------------------------------------
Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
(1) [1,2]=X (1) [2,2]=X (1)
Num of hypo = 4 --- cells:
0 1 2
1 3 0
0 0
0
NO BEST TRANSLATION
--------------------------------------------------------
This is from a fresh svn checkout that I compiled just before running. The
glue rules seem to be failing when trying to combine the chart cells that
cover "<s> 说了算".
My glue grammar has 4666 entries in it, for what it's worth. I can send it
to you if you want, but it might be too big to put up here on the forum.
Is there a quick-and-dirty way to see what categories are inserted into
which cells when (some verbosity setting, perhaps)?
> I corrected this behaviour recently
>
http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004
Ah, yes. I named the binary ...19-june-2011, but I had copied it from a
previous svn checkout, sorry. These things are still happening on the
latest checkout, though.
--D.N.
2011/6/22 Hieu Hoang <[email protected]>
> **
> hi dennis
>
> You're right, it should be working. The entries in the glue rules might be
> pruned. Can you try to change the [table-limit] in the ini file to
> [ttable-limit]
> 100
> 10000000
> or
> [ttable-limit]
> 100
> 0
>
> Each row correspond to the table pruning limit for each table. If you
> provide only 1 entry, then it prune every table uniformly.
> StaticData.cpp (line 894)
> For a grammar with lots of non-terminals like yours, the table limit may be
> cutting off the some of the entries in the glue rule table
>
> Also, the decoder shouldn't be processing <s> and </s> as unknown words,
> they should only be translated by the glue rules. This is the reason you get
> 1587 translations of </s>.
>
> I corrected this behaviour recently
>
> http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/trunk/moses/src/ChartTranslationOptionCollection.cpp?r1=4004&r2=4003&pathrev=4004
>
>
> On 23/06/2011 06:14, Dennis Mehay wrote:
>
> Just in case it confuses anyone, both commands (below) were run in the same
> way, I just simplified it for expository purposes to " moses_chart -f
> moses.ini -cube-pruning-pop-limit 2000" in the first case, but not in the
> second.
>
> --D.N.
>
> 2011/6/22 Dennis Mehay <[email protected]>
>
>> Hi Philipp,
>>
>> Thanks for the reply. I tracked some of the cases down to a *known* word
>> (or whitespace-tokenized thingie, anyway -- I don't know much of what
>> constitutes a word in written Chinese) by doing the following:
>>
>> ----------------------------------------------------------------------
>> $ echo "说了算" | moses_chart -f moses.ini -cube-pruning-pop-limit 2000
>>
>> Translating: <s> 说了算 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
>> (1) [1,2]=X (1) [2,2]=X (1)
>>
>> Num of hypo = 1591 --- cells:
>> 0 1 2
>> 1 3 1587
>> 0 0
>> 0
>> NO BEST TRANSLATION
>> ----------------------------------------------------------------------
>>
>> (An aside: 1587 is the number of categories in the unknown word list. Why
>> does the last token, viz., "</s>", get that many cells? )
>>
>> Anyhow, sure enough, there are three entries for the middle token "说了算"
>>
>> ----------------------------------------------------------------------
>> $ zless rule-table.gz
>> ...
>> 说了算 [X] ||| is [((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] ||| 0.000113126
>> 6.94e-05 0.00475133 0.5 2.718 ||| ||| 126 3
>> 说了算 [X] ||| is necessary [(S\NP[expl])/(S[to]\NP)] ||| 0.000309866
>> 6.94e-05 0.00475133 0.00028945 2.718 ||| ||| 46 3
>> 说了算 [X] ||| is necessary to [(S\NP[expl])/(S[b]\NP)] ||| 0.000208847
>> 6.94e-05 0.00475133 1.07891e-05 2.718 ||| ||| 68.25 3
>> ...
>> ----------------------------------------------------------------------
>>
>> There are entries in the glue table for these three categories --
>> ((S\NP[expl])/(S[to]\NP))/(S[adj]\NP), (S\NP[expl])/(S[to]\NP) and
>> (S\NP[expl])/(S[b]\NP) --- so we should be able to hack together a
>> translation using any of them.
>>
>> ----------------------------------------------------------------------
>> <s> [X] ||| <s> [Q] ||| 1 |||
>> ...
>> [X][Q] [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [X] ||| [X][Q]
>> [X][((S\NP[expl])/(S[to]\NP))/(S[adj]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>> ...
>> [X][Q] [X][(S\NP[expl])/(S[to]\NP)] [X] ||| [X][Q]
>> [X][(S\NP[expl])/(S[to]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>> ...
>> [X][Q] [X][(S\NP[expl])/(S[b]\NP)] [X] ||| [X][Q]
>> [X][(S\NP[expl])/(S[b]\NP)] [Q] ||| 2.718 ||| 0-0 1-1
>> ...
>> ----------------------------------------------------------------------
>>
>> And just to be sure that it isn't an unknown word problem, let's mangle
>> the token "说了算" by deleting the last character and see what happens:
>>
>> ----------------------------------------------------------------------
>> $ echo "说了" | ../moses/bin/moses-chart-19-june-2011 -f
>> dev-test/ZhEn/mert/run1.moses.ini -cube-pruning-pop-limit 2000
>> Translating: <s> 说了 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [1,1]=X
>> (1) [1,2]=X (1) [2,2]=X (1)
>>
>> Num of hypo = 6396 --- cells:
>> 0 1 2
>> 1 1587 1587
>> 1 0
>> 1
>> BEST TRANSLATION: 4763 Q </s> :0-0 : pC=0.000, c=-1.002 [0..2] 3176
>> [total=-22.789] <<-1.303, -1.940, -46.302, 0.000, 0.000, 0.000, 0.000,
>> 0.000, 1.000>>
>> 说了
>> ----------------------------------------------------------------------
>>
>> The best "translation" is just a pass-through, as expected (and there are
>> 1587 nodes for that unknown token -- just as many as there are unknown word
>> lhs's in the unknown-lhs file).
>>
>> Strange. Very strange. Or am I missing the obvious?
>>
>> I'm at a loss here. Does anyone have any guesses as to what's going on
>> here?
>>
>> --D.N.
>>
>>
>> 2011/6/22 Philipp Koehn <[email protected]>
>>
>>> Hi,
>>>
>>> there always should be a rule to combine a span to the left.
>>>
>>> Check what labels are chosen for the 13th word, and why there
>>> are no glue rules for it.
>>>
>>> If I would hazard a guess, I would suspect that this is an
>>> unknown word and a file with the likely labels for unknown words
>>> is used, but these do not match the glue grammar.
>>>
>>> -phi
>>>
>>> 2011/6/22 Dennis Mehay <[email protected]>:
>>> > Hi all,
>>> >
>>> > I posted this, but it bounced. My attachments were too big. I'm
>>> resending
>>> > without the larger attachment. Apologies for any duplicate posting.
>>> >
>>> > I'm running moses_chart to do some syntax-based MT experiments, and,
>>> during
>>> > tuning, I'm coming across some instances where the decoder can't
>>> produce a
>>> > translation (btw 32 and 38 in a 500 sentence tuning set). This should
>>> not
>>> > be happening, so far as I can tell, since I have a glue grammar (where
>>> all
>>> > the nonterminals of the training set plus the [Q] nonterminal are
>>> accounted
>>> > for), and an 'unknown-lhs' list with the relative frequencies of all
>>> the
>>> > categories as they span only a single word in the training set (i.e.,
>>> the
>>> > frequency of each category's spanning a single word in the rule table /
>>> the
>>> > total number of single-word instances in the rule table).
>>> >
>>> > Here is an example of a sentence that there was no translation for:
>>> >
>>> > ------------------------------
>>> > ---------------------------------------------------------
>>> > Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
>>> > ...
>>> > Decoding:
>>> > Num of hypo = 84813 --- cells:
>>> > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
>>> 17 18
>>> > 19 20 21
>>> > 1 100 77 93 83 99 99 100 100 85 99 43 85 3 99 85 18 100
>>> > 85 3 14 1000
>>> > 40 960 278 717 916 857 976 276 396 952 958 150 0 0 919 74 402
>>> 802
>>> > 0 0 12
>>> > 200 975 908 849 850 858 968 971 971 862 974 0 0 0 852 865 984
>>> > 0 0 0
>>> > 200 940 849 889 763 715 990 962 979 905 0 0 0 0 864 984
>>> 0
>>> > 0 0
>>> > 200 868 939 886 863 803 887 861 981 0 0 0 0 0 871 0
>>> > 0 0
>>> > 200 828 910 801 838 796 722 870 0 0 0 0 0 0 0
>>> 0
>>> > 0
>>> > 200 799 914 832 801 745 926 0 0 0 0 0 0 0 0
>>> 0
>>> > 200 756 819 901 693 692 0 0 0 0 0 0 0 0
>>> 0
>>> > 200 716 680 665 437 0 0 0 0 0 0 0 0 0
>>> > 200 683 527 929 0 0 0 0 0 0 0 0 0
>>> > 200 532 588 0 0 0 0 0 0 0 0 0
>>> > 200 580 0 0 0 0 0 0 0 0 0
>>> > 200 0 0 0 0 0 0 0 0 0
>>> > 0 0 0 0 0 0 0 0 0
>>> > 0 0 0 0 0 0 0 0
>>> > 0 0 0 0 0 0 0
>>> > 0 0 0 0 0 0
>>> > 0 0 0 0 0
>>> > 0 0 0 0
>>> > 0 0 0
>>> > 0 0
>>> > 0
>>> > NO BEST TRANSLATION
>>> >
>>> > Translation took 4.340 seconds
>>> >
>>> ---------------------------------------------------------------------------------------
>>> >
>>> > The ASCII-art chart's alignment may be a bit off, but, just eye-balling
>>> it,
>>> > it looks as if the 19th word (index 18) has a chart entry count above
>>> it,
>>> > but then this entry does not get combined with what's to the left using
>>> the
>>> > glue rules.
>>> >
>>> > Could this be a pruning or cutoff issue (i.e., stack size,
>>> > cube-pruning-pop-limit, maximum number of rules per span, etc.)? Or
>>> maybe
>>> > it has to do with the fact that my unknown-lhs file has *all*
>>> categories
>>> > that spanned a single word in the training set. Maybe I should prune
>>> it to
>>> > the top 10 or 20, or so. I'm really at a loss here. I thought the
>>> glue
>>> > grammar would make the decoder always return an answer, no matter how
>>> awful.
>>> >
>>> > Any insight?
>>> >
>>> > I have attached my moses.ini file in case anyone wants to have a look.
>>> I
>>> > can also send the glue rule file later, but, as I said, it seems to
>>> account
>>> > for all of the training set's categories (and it was produced
>>> automatically
>>> > using the -glue-grammar option).
>>> >
>>> > Best,
>>> > Dennis
>>> > _______________________________________________
>>> > Moses-support mailing list
>>> > [email protected]
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> >
>>>
>>
>>
>
> _______________________________________________
> Moses-support mailing
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support