Dear anh Bach,
Certainly, I check all of these things!
I even limit the number of words per sentence is under 40.
On Tue, Nov 6, 2012 at 2:13 PM, Nguyen Bach <[email protected]> wrote:

> Cuong,
>
> I guess the problem is not about your server, it is a error when running
> GIZA.
> GIZA is quite a stable tool. When this kind of problem happen you can
> first go back to your training data and perform the following checks
> 1. Is there any empty sentence pair? An empty sentence pair is a pair
> which is empty on the source or target or both sides.
> 2. Is there any exceptional long sentence pair?
>
> Nguyen
>
>  On Mon, Nov 5, 2012 at 8:37 PM, Cuong Hoang <[email protected]>wrote:
>
>>  Hi all,
>> I use a server which is 130GB RAM and 24 cores.
>> I have a wonder about the training data which I could use.
>>
>> In fact, I want to train an STM system from a very large bilingual corpus
>> such as WMT 2010 (or NIST) to see what is the biggest BLEU score I could
>> obtain (through I known that it also depends deeply from the test size).
>>
>> However, I usually obtain some unwanted errors in the MOSES's training. I
>> have to truncate to obtain a smaller training corpus. If I do not truncate
>> the size, I am usually stuck some errors such as:
>>
>> ERROR: Execution of: /home/cuongh/CODE/giza-pp/GIZA++  -CoocurrenceFile
>> /home/cuongh/STATMT.BIG/giza.fr-en/fr-en.cooc -c
>> /home/cuongh/STATMT.BIG/corpus/fr-en-int-train.snt -m1 5 -m2 3 -m3 3 -m4 0
>> -mh 0 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4
>> -o /home/cuongh/STATMT.BIG/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s
>> /home/cuongh/STATMT.BIG/corpus/en.vcb -t
>> /home/cuongh/STATMT.BIG/corpus/fr.vcb
>> *  died with signal 11, with coredump*
>>
>> I just wonder that for a server is used like mine, what is the largest
>> training data I could train?
>> In addition, for trainining MOSES on a very large bilingual data, what
>> are the recommends from the experts here would advice to me?
>>
>> I really need it.
>> I love working on SMT but frankly, I'm now just a Master student, not a
>> PhD. However, I will graduate soon.
>> Tks,
>> Best regards,
>> C. Hoang
>>  --
>> Hoàng Cường
>> SMTNerd
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>


-- 
Hoàng Cường
SMTNerd
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to