Hi Cuong Our servers are similar spec and we can train with all the WMT data (15M+ sentences for fr-en) and also train large NIST systems. But you really want to use mgiza (multi-threaded giza) rather than GIZA++ since the latter will take weeks to align a large corpus,
cheers - Barry On 06/11/12 04:37, Cuong Hoang wrote: > Hi all, > I use a server which is 130GB RAM and 24 cores. > I have a wonder about the training data which I could use. > > In fact, I want to train an STM system from a very large bilingual > corpus such as WMT 2010 (or NIST) to see what is the biggest BLEU > score I could obtain (through I known that it also depends deeply from > the test size). > > However, I usually obtain some unwanted errors in the MOSES's > training. I have to truncate to obtain a smaller training corpus. If I > do not truncate the size, I am usually stuck some errors such as: > > ERROR: Execution of: /home/cuongh/CODE/giza-pp/GIZA++ -CoocurrenceFile > /home/cuongh/STATMT.BIG/giza.fr-en/fr-en.cooc -c > /home/cuongh/STATMT.BIG/corpus/fr-en-int-train.snt -m1 5 -m2 3 -m3 3 > -m4 0 -mh 0 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 > -nsmooth 4 -o /home/cuongh/STATMT.BIG/giza.fr-en/fr-en -onlyaldumps 1 > -p0 0.999 -s /home/cuongh/STATMT.BIG/corpus/en.vcb -t > /home/cuongh/STATMT.BIG/corpus/fr.vcb > *died with signal 11, with coredump* > > I just wonder that for a server is used like mine, what is the largest > training data I could train? > In addition, for trainining MOSES on a very large bilingual data, what > are the recommends from the experts here would advice to me? > > I really need it. > I love working on SMT but frankly, I'm now just a Master student, not > a PhD. However, I will graduate soon. > Tks, > Best regards, > C. Hoang > -- > Hoàng Cường > SMTNerd > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
