Hi Roberto,

To add to Barry's answer for (2): if you're translating into English, 
you need a parallel corpus for training and a monolingual English corpus 
to train the language model. You can use the English side of your 
parallel corpus as your LM corpus, but you will generally benefit from 
using a much larger corpus if you have one.

Best,
Suzy

On 27/01/11 8:39 AM, Barry Haddow wrote:
> Hi Roberto
>
> Some answers inline
>
> On Wednesday 26 Jan 2011 15:50:19 Roberto Rios wrote:
>> hello..I finished installing giza++ and moses...it runs good. now i am
>> proceeding to install mgiza for multithread.....I am having a file issues.
>>
>> 1. in "http://www.statmt.org/wmt07/baseline.html";
>>
>>
>>     - Copy GIZA++ and mkcls to a bin location for Moses Scripts
>>      mkdir -p bin
>>     cp GIZA++-v2/GIZA++ bin/
>>     cp GIZA++-v2/snt2cooc.out bin/
>>     cp mkcls-v2/mkcls bin/
>>
>>        1.1)  where is it i have to copy mgiza, mkcls, mergealignment.py an
>> snt2cooc?
>
> Same place as GIZA++
>
>>        1.2) Do i have to replace the old mkcls and snt2cooc.out for the new
>> ones comming with mgiza?
>
> Should be the same.
>
>>        1.3) Is there a difference between snt2cooc and snt2cooc.out?
>
> Not as far as I know.
>
>>
>> 2. the corpus that is been tokenized for the LM; is it the same corpus as
>> the english corpus?
>
> If you're translating into English, then you use English text to build the LM.
>
>>
>> 3. Does tunning takes longer than training?..it took my server a couple of
>> days for tunning and 4 hours of training....would the time for tunning get
>> better after the first run?
>
> Yes, tuning can take a couple of days for a big model.
>
>>
>> 4. how do i feed directories of corpuses into my system?. I am able to run
>> the tutorial already mention, but that is only one corpus,, i have a lot of
>> corpuses organized in directories...trying to do one by one would be a
>> killer.
>>
>
> The best idea is to concatenate the corpora together.
>
>> 5. if i get anew corpus do I need to run training and tunning all
>>   again...it seems that training uses old trained and merges the new corpus
>>   into it..is that correct?
>
> If you update your corpus, then you need to go back to the beginning. The
> standard training pipeline works in batch mode.
>
>>
>> 6. I have the last version of moses....the only script i have is
>> train-model.perl...but for what i read is better to do
>> train-factored-phrase-model.perl...i
>> do not have it oin my moses or scripts/2011....../training
>>
>
> train-factored-phrase-model.perl is now called train-model.perl
>
> best regards
> Barry
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
Suzy Howlett
http://www.showlett.id.au/
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to