Hi all,

I posted this, but it bounced.  My attachments were too big.  I'm resending
without the larger attachment.  Apologies for any duplicate posting.

I'm running moses_chart to do some syntax-based MT experiments, and, during
tuning, I'm coming across some instances where the decoder can't produce a
translation (btw 32 and 38 in a 500 sentence tuning set).  This should not
be happening, so far as I can tell, since I have a glue grammar (where all
the nonterminals of the training set plus the [Q] nonterminal are accounted
for), and an 'unknown-lhs' list with the relative frequencies of all the
categories as they span only a single word in the training set (i.e., the
frequency of each category's spanning a single word in the rule table / the
total number of single-word instances in the rule table).

Here is an example of a sentence that there was no translation for:

------------------------------
---------------------------------------------------------
Translating: <s> 没有 规划 作 指导 , 就 可能 出现 谁 有 权 谁 说了算 , 谁 官 大 谁 说了算 . </s>
...
Decoding:
Num of hypo = 84813 --- cells:
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20  21
  1 100  77  93  83  99  99 100 100  85  99  43  85   3  99  85  18 100
85   3  14 1000
   40 960 278 717 916 857 976 276 396 952 958 150   0   0 919  74 402 802
0   0  12
    200 975 908 849 850 858 968 971 971 862 974   0   0   0 852 865 984
0   0   0
      200 940 849 889 763 715 990 962 979 905   0   0   0   0 864 984   0
0   0
        200 868 939 886 863 803 887 861 981   0   0   0   0   0 871   0
0   0
          200 828 910 801 838 796 722 870   0   0   0   0   0   0   0   0
0
            200 799 914 832 801 745 926   0   0   0   0   0   0   0   0   0
              200 756 819 901 693 692   0   0   0   0   0   0   0   0   0
                200 716 680 665 437   0   0   0   0   0   0   0   0   0
                  200 683 527 929   0   0   0   0   0   0   0   0   0
                    200 532 588   0   0   0   0   0   0   0   0   0
                      200 580   0   0   0   0   0   0   0   0   0
                        200   0   0   0   0   0   0   0   0   0
                            0   0   0   0   0   0   0   0   0
                              0   0   0   0   0   0   0   0
                                0   0   0   0   0   0   0
                                  0   0   0   0   0   0
                                    0   0   0   0   0
                                      0   0   0   0
                                        0   0   0
                                          0   0
                                            0
NO BEST TRANSLATION

Translation took 4.340 seconds
---------------------------------------------------------------------------------------

The ASCII-art chart's alignment may be a bit off, but, just eye-balling it,
it looks as if the 19th word (index 18) has a chart entry count above it,
but then this entry does not get combined with what's to the left using the
glue rules.

Could this be a pruning or cutoff issue (i.e., stack size,
cube-pruning-pop-limit, maximum number of rules per span, etc.)?  Or maybe
it has to do with the fact that my unknown-lhs file has *all* categories
that spanned a single word in the training set.  Maybe I should prune it to
the top 10 or 20, or so.  I'm really at a loss here.  I thought the glue
grammar would make the decoder always return an answer, no matter how awful.

Any insight?

I have attached my moses.ini file in case anyone wants to have a look.  I
can also send the glue rule file later, but, as I said, it seems to account
for all of the training set's categories (and it was produced automatically
using the -glue-grammar option).

Best,
Dennis

Attachment: moses.ini
Description: Binary data

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to