Hi Cuong

Our servers are similar spec and we can train with all the WMT data 
(15M+ sentences for fr-en) and also train large NIST systems. But you 
really want to use mgiza (multi-threaded giza) rather than GIZA++ since 
the latter will take weeks to align a large corpus,

cheers - Barry

On 06/11/12 04:37, Cuong Hoang wrote:
> Hi all,
> I use a server which is 130GB RAM and 24 cores.
> I have a wonder about the training data which I could use.
>
> In fact, I want to train an STM system from a very large bilingual 
> corpus such as WMT 2010 (or NIST) to see what is the biggest BLEU 
> score I could obtain (through I known that it also depends deeply from 
> the test size).
>
> However, I usually obtain some unwanted errors in the MOSES's 
> training. I have to truncate to obtain a smaller training corpus. If I 
> do not truncate the size, I am usually stuck some errors such as:
>
> ERROR: Execution of: /home/cuongh/CODE/giza-pp/GIZA++ -CoocurrenceFile 
> /home/cuongh/STATMT.BIG/giza.fr-en/fr-en.cooc -c 
> /home/cuongh/STATMT.BIG/corpus/fr-en-int-train.snt -m1 5 -m2 3 -m3 3 
> -m4 0 -mh 0 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 
> -nsmooth 4 -o /home/cuongh/STATMT.BIG/giza.fr-en/fr-en -onlyaldumps 1 
> -p0 0.999 -s /home/cuongh/STATMT.BIG/corpus/en.vcb -t 
> /home/cuongh/STATMT.BIG/corpus/fr.vcb
> *died with signal 11, with coredump*
>
> I just wonder that for a server is used like mine, what is the largest 
> training data I could train?
> In addition, for trainining MOSES on a very large bilingual data, what 
> are the recommends from the experts here would advice to me?
>
> I really need it.
> I love working on SMT but frankly, I'm now just a Master student, not 
> a PhD. However, I will graduate soon.
> Tks,
> Best regards,
> C. Hoang
> -- 
> Hoàng Cường
> SMTNerd
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to