Hi Saso & Hieu,

Thank you for your replies !

This is what I am currently doing. Starting with simpler models (on a very 
short corpus so i have a quicker feedback). I tried this configuration with 
translation-factors = "word+stem+pos -> word+stem+pos" and it works (giving me 
the best results so far, around 28-29 BLEU).

Whenever I tried to add a generation step, however simple it might be, it 
crashes at the TUNING:tune phase.
Here what I fail to understand : in order to use a factor in a generation step, 
say for example :

generation-factors = "word+stem -> pos"

Do you first need to translate the left-hand side factors ? e.g. "word -> 
word,stem -> stem" or "word+stem -> word+stem".

Thank you for your help !
________________________________
From: Hieu Hoang [[email protected]]
Sent: 01 August 2016 20:50
To: Gmehlin Floran
Cc: [email protected]
Subject: Re: [Moses-support] Factored model configuration using stems and POS

I would start simple, then build it up once i know what it's doing, eg. start 
with
    input-factors = word stem pos
    output-factors = word stem pos
    alignment-factors = "word -> word"
    translation-factors = "word+stem+pos -> word+stem+pos"
    reordering-factors = "word -> word"
    generation-factors = ""
    decoding-steps = "t0"


Hieu Hoang
http://www.hoang.co.uk/hieu

On 27 July 2016 at 11:46, Gmehlin Floran 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am trying to build a factored translation model using stems and 
part-of-speech for a week now and I cannot have satisfying results. This 
probably comes from my factor configuration as I probably do not fully 
understand how it work (I am following the paper Factored Translation Model 
from Koehn and Hoang).

I previously built a standard phrase based model (with the same corpus) which 
gave me around 24-25 BLEU score (DE-EN). For my actual factored model, BLEU 
score is around 1 (?).

I tried opening the moses.ini's, (tuned or not) to see if I could have a 
something translated by copy/pasting some lines from the original corpus, but 
it only translates from german to german and does not recognize most of the 
words if not all.

 The motivation behind the factored model is that there are too many OOVs with 
the standard phrase-base, so I wanted to try using stems to reduce them.

I am annotating the corpus with TreeTagger and the factor configuration is as 
following :

input-factors = word stem pos
output-factors = word stem pos
alignment-factors = "word+stem -> word+stem"
translation-factors = "stem -> stem,pos -> pos"
reordering-factors = "word -> word"
generation-factors = "stem -> pos,stem+pos -> word"
decoding-steps = "t0,g0,t1,g1"

Is there something wrong with that ?

I only use a single language model over surface forms as the LM over POS yields 
a segmentation fault in the tuning phase.

Does anyone have an idea how I should configure my model to exploit stems in 
the source language ?

Thanks a lot,

Floran

_______________________________________________
Moses-support mailing list
[email protected]<mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to