Hi Roberto, To add to Barry's answer for (2): if you're translating into English, you need a parallel corpus for training and a monolingual English corpus to train the language model. You can use the English side of your parallel corpus as your LM corpus, but you will generally benefit from using a much larger corpus if you have one.
Best, Suzy On 27/01/11 8:39 AM, Barry Haddow wrote: > Hi Roberto > > Some answers inline > > On Wednesday 26 Jan 2011 15:50:19 Roberto Rios wrote: >> hello..I finished installing giza++ and moses...it runs good. now i am >> proceeding to install mgiza for multithread.....I am having a file issues. >> >> 1. in "http://www.statmt.org/wmt07/baseline.html" >> >> >> - Copy GIZA++ and mkcls to a bin location for Moses Scripts >> mkdir -p bin >> cp GIZA++-v2/GIZA++ bin/ >> cp GIZA++-v2/snt2cooc.out bin/ >> cp mkcls-v2/mkcls bin/ >> >> 1.1) where is it i have to copy mgiza, mkcls, mergealignment.py an >> snt2cooc? > > Same place as GIZA++ > >> 1.2) Do i have to replace the old mkcls and snt2cooc.out for the new >> ones comming with mgiza? > > Should be the same. > >> 1.3) Is there a difference between snt2cooc and snt2cooc.out? > > Not as far as I know. > >> >> 2. the corpus that is been tokenized for the LM; is it the same corpus as >> the english corpus? > > If you're translating into English, then you use English text to build the LM. > >> >> 3. Does tunning takes longer than training?..it took my server a couple of >> days for tunning and 4 hours of training....would the time for tunning get >> better after the first run? > > Yes, tuning can take a couple of days for a big model. > >> >> 4. how do i feed directories of corpuses into my system?. I am able to run >> the tutorial already mention, but that is only one corpus,, i have a lot of >> corpuses organized in directories...trying to do one by one would be a >> killer. >> > > The best idea is to concatenate the corpora together. > >> 5. if i get anew corpus do I need to run training and tunning all >> again...it seems that training uses old trained and merges the new corpus >> into it..is that correct? > > If you update your corpus, then you need to go back to the beginning. The > standard training pipeline works in batch mode. > >> >> 6. I have the last version of moses....the only script i have is >> train-model.perl...but for what i read is better to do >> train-factored-phrase-model.perl...i >> do not have it oin my moses or scripts/2011....../training >> > > train-factored-phrase-model.perl is now called train-model.perl > > best regards > Barry > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- Suzy Howlett http://www.showlett.id.au/ _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
