You're right on both counts 1. The unknown words are penalised with a fixed -100 score and a fixed 1 weight. This weight may be changed but u'll have to battle with the mert scripts.
2. the edge scores for lattice & confusion networks are included as translation model scores, which works ok for most cases except unknwon words etc. Ideally, they should be separated out into their own class. You may want to do that, or hack the code even more. Hieu Hoang www.hoang.co.uk/hieu -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jean-Baptiste Fouet Sent: 12 August 2008 08:42 To: [email protected] Subject: [Moses-support] problems with unknown words costs in WordLattice (input-type=2) decoding Hi all, i am trying to use the Lattice decoding (with input-type=2 ie real word lattice not confusion network) and i think i identified some issues with the way unknown words are dealt with: If i understand correctly, the weigth associated to the penalty associated to a given edge of the lattice is weight-i, which is implemented as an additional translation table weight. But unknown words have ALL Translationweights set to 0, even that one, so the edge cost is discarded for an unknow word (ie translation generated by ProcessOneUnknownWord). This means that if presented with 2 edge with 2 different unknown words , the decoder will pick the first edge, not the one with the smallest cost Is that correct? i tested that with the ressource in mosesdecoder/regression-testing/tests/lattice-distortion and the lattice: ((('UW1',0.0,1),('UW2',1.0,1),),) result is UW1 which is not correct. (p=0.0 means a cost of -100, p=1.0 means a cost of 0 so smallest cost should be UW2) A second problem is that the feature weight associated with the unknown word penalty can't be modified (always 1), so a not found word always have a score penaly of -100 in addition to the lm cost. This means that an edge with probability 0 (ie cost -100*weight-i + lm costs) labeled with a known word will always be prefered to an edge with probability 1 (ie 0 cost 0) labeled with a not found word (cost -100*1=-100+ lm costs). (unless weight-i is bigger than 1) Is that correct? tested with: ((('A',0.0,1),('UW',1.0,1),),) result is 1 (translation of A) which is not what i want If i am right i plan to correct it by: 1) applying the edge cost even for a nfw 2) introducing a weight-u option to tune the not foud word feature weight Does that seems ok? am I overlooking something ? Thanks JB Fouet _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
