You're right on both counts

1. The unknown words are penalised with a fixed -100 score and a fixed 1
weight. This weight may be changed but u'll have to battle with the mert
scripts. 

2. the edge scores for lattice & confusion networks are included as
translation model scores, which works ok for most cases except unknwon words
etc. Ideally, they should be separated out into their own class. You may
want to do that, or hack the code even more.


Hieu Hoang
www.hoang.co.uk/hieu

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Jean-Baptiste Fouet
Sent: 12 August 2008 08:42
To: [email protected]
Subject: [Moses-support] problems with unknown words costs in WordLattice
(input-type=2) decoding

Hi all, i am trying to use the Lattice decoding (with input-type=2 ie real
word lattice not confusion network) and i think i identified some issues
with the way unknown words are dealt with:

If i understand correctly, the  weigth associated to the penalty associated
to a given edge of the lattice is weight-i, which is implemented as an
additional translation table weight. But unknown words  have ALL
Translationweights set to 0, even that one, so the edge cost is discarded
for an unknow word (ie translation generated by ProcessOneUnknownWord).
This means that if presented with 2 edge with 2 different unknown words ,
the decoder will pick the first edge, not the one  with the smallest cost Is
that correct?
i tested that with the ressource in
mosesdecoder/regression-testing/tests/lattice-distortion and the lattice:

((('UW1',0.0,1),('UW2',1.0,1),),)
result is UW1 which is not correct. (p=0.0  means a cost of -100, p=1.0
means a cost of 0 so smallest cost should be UW2)

A second problem is that the feature weight associated with the unknown word
penalty  can't be modified (always 1), so a not found word always have a
score penaly of -100 in addition to the lm  cost.
This means that an edge with probability 0 (ie cost -100*weight-i + lm
costs)
labeled with a known word will always be prefered to an edge with
probability 1 (ie 0 cost 0) labeled with a  not found word (cost
-100*1=-100+ lm costs). (unless weight-i is bigger than 1)

Is that correct?
tested with:
((('A',0.0,1),('UW',1.0,1),),)
result is 1 (translation of A)
which is not what i want


If i am right i plan to correct it by:

1) applying the edge cost even for a nfw
2) introducing a weight-u option to tune the not foud word feature weight

Does that seems ok? am I overlooking something ?

Thanks
JB Fouet


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to