This is a basic question as I am relatively new to Moses. Can someone tell me why the alignment of texts is not picking up many (common) words and phrases from the input? The decoder shows many UNKs that should not be.
I am experimenting with a factored model using EMS and parallel corpora from Europarl(fr-en) and UNdoc(es-en). The decoder results show a high incidence of UNKs in both language experiments. I reverted to a model without factors to see if factoring was an issue, but the incidence of UNKs in the decoder results are very much the same. I checked the parallel input corpus and the cleaned corpus for common terms like 'vous êtes' (French for 'you are'). There are many instances of words and terms contained in the parallel texts input that the decoder shows as UNK (e.g. 'êtes'). I checked the parallel data sentences visually by sampling and the parallel corpus seem reasonably good. I tried with different sizes (100,000, 500,000 and 1.5 million parallel sentences). The decoder results are similar for for both fr-en and es-en. Many unexpected UNKs. I ran the LM independently(without EMS) as below and saw a high incidence of OOVs(as below): /apps/moses/mosesInstalls/irstlm/bin/compile-lm --text yes /apps/moses/mosesInstalls/en-es/undoc.2000.en-es.lm.es.gz /apps/moses/mosesInstalls/en-es/undoc.2000.en-es.arpa.es CHECK FOR : ...../en-es/undoc.2000.en-es.arpa.es OOV code is 641175 My EMS script uses IRSTLM as below. # irstlm lm-training = "$moses-script-dir/generic/trainlm-irst.perl -cores $cores -irst-dir $irstlm-dir -temp-dir $working-dir/lm" settings = "" lm-binarizer = $irstlm-dir/compile-lm order = 5 # kenlm, also set type to 8 --- Zai added -- text yes lm-binarizer = "$moses-bin-dir/build_binary -i" type = 8 With Training settings as below : ### symmetrization method for word alignments from giza output alignment-symmetrization-method = grow-diag-final-and ### lexicalized reordering: specify orientation type lexicalized-reordering = msd-bidirectional-fe Thanks for help! Zai
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
