Hi kabir

1. I Don't think 4000 sentences is enough training data. As a comparison, 
europarl fr-en corpus contains 1,800,000 sentences 

2. You need about 4gb ram to run training

3 you should try the basic training 1st, without factors, before doing anything 
more advance

Please follow this guide to get yourself familiar with mt and Moses 
  http://www.statmt.org/moses_steps.html

Hieu
Sent from my flying horse

On 18 Feb 2012, at 09:24 AM, Kabir Joshi <[email protected]> wrote:

> Hello All,
> 
> I am a student. I am new to MT. I have configured moses baseline system. For 
> English-Punjabi language pair. It is working fine. Now, I wish to train 
> factored SMT. For this I have trained 2000 sentences in 
> surface-word|lemma|POS-tag format for both sides and will train another 2000. 
> 
> My queries are
> 1. would 4000 sentences would be enough.
> 2. What memory overheads would be used.
> 3. Would training a pos based lm requires a file with seperate only pos tag.
> 
> Please answer my queries.
> 
> Kabir.
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to