Hi all, i am trying to use the Lattice decoding (with input-type=2 ie
real word lattice not confusion network) and i think i identified some
issues with the way unknown words are dealt with:

If i understand correctly, the  weigth associated to the penalty
associated to a given edge of the lattice is weight-i, which is
implemented as an additional translation table weight. But unknown
words  have ALL Translationweights set to 0, even that one, so the edge
cost is discarded for an unknow word (ie translation generated by
ProcessOneUnknownWord).
This means that if presented with 2 edge with 2 different unknown words 
, the decoder will pick the first edge, not the one  with the smallest cost
Is that correct?
i tested that with the ressource in
mosesdecoder/regression-testing/tests/lattice-distortion and the lattice:

((('UW1',0.0,1),('UW2',1.0,1),),)
result is UW1 which is not correct. (p=0.0  means a cost of -100, p=1.0
means a cost of 0 so smallest cost should be UW2)

A second problem is that the feature weight associated with the unknown
word penalty  can't be modified (always 1), so a not found word always
have a score penaly of -100 in addition to the lm  cost.
This means that an edge with probability 0 (ie cost -100*weight-i + lm 
costs)
labeled with a known word will always be prefered to an edge with
probability 1 (ie 0 cost 0) labeled with a  not found word
(cost -100*1=-100+ lm costs). (unless weight-i is bigger than 1)

Is that correct?
tested with:
((('A',0.0,1),('UW',1.0,1),),)
result is 1 (translation of A)
which is not what i want


If i am right i plan to correct it by:

1) applying the edge cost even for a nfw
2) introducing a weight-u option to tune the not foud word feature weight

Does that seems ok? am I overlooking something ?

Thanks
JB Fouet


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to