Hi kabir 1. I Don't think 4000 sentences is enough training data. As a comparison, europarl fr-en corpus contains 1,800,000 sentences
2. You need about 4gb ram to run training 3 you should try the basic training 1st, without factors, before doing anything more advance Please follow this guide to get yourself familiar with mt and Moses http://www.statmt.org/moses_steps.html Hieu Sent from my flying horse On 18 Feb 2012, at 09:24 AM, Kabir Joshi <[email protected]> wrote: > Hello All, > > I am a student. I am new to MT. I have configured moses baseline system. For > English-Punjabi language pair. It is working fine. Now, I wish to train > factored SMT. For this I have trained 2000 sentences in > surface-word|lemma|POS-tag format for both sides and will train another 2000. > > My queries are > 1. would 4000 sentences would be enough. > 2. What memory overheads would be used. > 3. Would training a pos based lm requires a file with seperate only pos tag. > > Please answer my queries. > > Kabir. > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
