Dear list users,

I'm trying to build a hungarian-english translation model. The
hungarian is a morphologically rich language so i decided to try out
the factored training. My factors were in this format :
surface|lemma|postag|morphtag
I built models with the following parameters:
1,3-0
1-0+3-0 --decoding-steps t0,t1

I didn't get what i except, the translation improved almost nothing
compared to surface-surface translation.
I tried out a different format. I changed all the "surface" words to
"lemma moprhtag" (separated by space as they were different words)
This solution gave much better results as expected in a morphological
point of view. But it can't stay like this, because of longer
sentences and "virtual reducing of ngram length". ( I hope i succeeded
to express myself clearly)

My question is that why is this different from building with "1-0+3-0
--decoding-steps t0,t1"? I thought it would be exactly the same.

Am i missing something? Can anybody who successfully built factored
training models with a morphologically rich language help?

Thanks in advance!

Br,
Attila
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to