Hi,

there always should be a rule to combine a span to the left.

Check what labels are chosen for the 13th word, and why there
are no glue rules for it.

If I would hazard a guess, I would suspect that this is an
unknown word and a file with the likely labels for unknown words
is used, but these do not match the glue grammar.

-phi

2011/6/22 Dennis Mehay <[email protected]>:
> Hi all,
>
> I posted this, but it bounced.  My attachments were too big.  I'm resending
> without the larger attachment.  Apologies for any duplicate posting.
>
> I'm running moses_chart to do some syntax-based MT experiments, and, during
> tuning, I'm coming across some instances where the decoder can't produce a
> translation (btw 32 and 38 in a 500 sentence tuning set).  This should not
> be happening, so far as I can tell, since I have a glue grammar (where all
> the nonterminals of the training set plus the [Q] nonterminal are accounted
> for), and an 'unknown-lhs' list with the relative frequencies of all the
> categories as they span only a single word in the training set (i.e., the
> frequency of each category's spanning a single word in the rule table / the
> total number of single-word instances in the rule table).
>
> Here is an example of a sentence that there was no translation for:
>
> ------------------------------
> ---------------------------------------------------------
> Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
> ...
> Decoding:
> Num of hypo = 84813 --- cells:
>   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
> 19  20  21
>   1 100  77  93  83  99  99 100 100  85  99  43  85   3  99  85  18 100
> 85   3  14 1000
>    40 960 278 717 916 857 976 276 396 952 958 150   0   0 919  74 402 802
> 0   0  12
>     200 975 908 849 850 858 968 971 971 862 974   0   0   0 852 865 984
> 0   0   0
>       200 940 849 889 763 715 990 962 979 905   0   0   0   0 864 984   0
> 0   0
>         200 868 939 886 863 803 887 861 981   0   0   0   0   0 871   0
> 0   0
>           200 828 910 801 838 796 722 870   0   0   0   0   0   0   0   0
> 0
>             200 799 914 832 801 745 926   0   0   0   0   0   0   0   0   0
>               200 756 819 901 693 692   0   0   0   0   0   0   0   0   0
>                 200 716 680 665 437   0   0   0   0   0   0   0   0   0
>                   200 683 527 929   0   0   0   0   0   0   0   0   0
>                     200 532 588   0   0   0   0   0   0   0   0   0
>                       200 580   0   0   0   0   0   0   0   0   0
>                         200   0   0   0   0   0   0   0   0   0
>                             0   0   0   0   0   0   0   0   0
>                               0   0   0   0   0   0   0   0
>                                 0   0   0   0   0   0   0
>                                   0   0   0   0   0   0
>                                     0   0   0   0   0
>                                       0   0   0   0
>                                         0   0   0
>                                           0   0
>                                             0
> NO BEST TRANSLATION
>
> Translation took 4.340 seconds
> ---------------------------------------------------------------------------------------
>
> The ASCII-art chart's alignment may be a bit off, but, just eye-balling it,
> it looks as if the 19th word (index 18) has a chart entry count above it,
> but then this entry does not get combined with what's to the left using the
> glue rules.
>
> Could this be a pruning or cutoff issue (i.e., stack size,
> cube-pruning-pop-limit, maximum number of rules per span, etc.)?  Or maybe
> it has to do with the fact that my unknown-lhs file has *all* categories
> that spanned a single word in the training set.  Maybe I should prune it to
> the top 10 or 20, or so.  I'm really at a loss here.  I thought the glue
> grammar would make the decoder always return an answer, no matter how awful.
>
> Any insight?
>
> I have attached my moses.ini file in case anyone wants to have a look.  I
> can also send the glue rule file later, but, as I said, it seems to account
> for all of the training set's categories (and it was produced automatically
> using the -glue-grammar option).
>
> Best,
> Dennis
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to