This is a basic question as I am relatively new to Moses.
Can someone tell me why the alignment of texts is not picking up many (common) 
words and phrases from the input? The decoder shows many UNKs that should not 
be.

I am experimenting with a factored model using EMS and parallel corpora from 
Europarl(fr-en) and UNdoc(es-en). The decoder results show a high incidence of 
UNKs in both language experiments. I reverted to a model without factors to see 
if factoring was an issue, but the incidence of UNKs in the decoder results are 
very much the same. I checked the parallel input corpus and the cleaned corpus 
for common terms like 'vous êtes' (French for 'you are'). There are many 
instances of words and terms contained in the parallel texts input that the 
decoder shows as UNK (e.g. 'êtes'). I checked the parallel data sentences 
visually by sampling and the parallel corpus seem reasonably good. I tried with 
different sizes (100,000, 500,000 and 1.5 million parallel sentences). The 
decoder results are similar for  for both fr-en and es-en. Many unexpected UNKs.

I ran the LM independently(without EMS) as below and saw a high incidence of 
OOVs(as below):
/apps/moses/mosesInstalls/irstlm/bin/compile-lm --text yes 
/apps/moses/mosesInstalls/en-es/undoc.2000.en-es.lm.es.gz 
/apps/moses/mosesInstalls/en-es/undoc.2000.en-es.arpa.es
CHECK FOR : ...../en-es/undoc.2000.en-es.arpa.es
OOV code is 641175

My EMS script uses IRSTLM as below.
# irstlm
lm-training = "$moses-script-dir/generic/trainlm-irst.perl -cores $cores 
-irst-dir $irstlm-dir -temp-dir $working-dir/lm"
settings = ""

lm-binarizer = $irstlm-dir/compile-lm
order = 5
# kenlm, also set type to 8  --- Zai added -- text yes
lm-binarizer = "$moses-bin-dir/build_binary -i"
type = 8

With Training settings as below :
### symmetrization method for word alignments from giza output
alignment-symmetrization-method = grow-diag-final-and
### lexicalized reordering: specify orientation type
lexicalized-reordering = msd-bidirectional-fe
Thanks for help!

Zai
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to