Dear list users, I'm trying to build a hungarian-english translation model. The hungarian is a morphologically rich language so i decided to try out the factored training. My factors were in this format : surface|lemma|postag|morphtag I built models with the following parameters: 1,3-0 1-0+3-0 --decoding-steps t0,t1
I didn't get what i except, the translation improved almost nothing compared to surface-surface translation. I tried out a different format. I changed all the "surface" words to "lemma moprhtag" (separated by space as they were different words) This solution gave much better results as expected in a morphological point of view. But it can't stay like this, because of longer sentences and "virtual reducing of ngram length". ( I hope i succeeded to express myself clearly) My question is that why is this different from building with "1-0+3-0 --decoding-steps t0,t1"? I thought it would be exactly the same. Am i missing something? Can anybody who successfully built factored training models with a morphologically rich language help? Thanks in advance! Br, Attila _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
