Dear anh Bach, Certainly, I check all of these things! I even limit the number of words per sentence is under 40. On Tue, Nov 6, 2012 at 2:13 PM, Nguyen Bach <[email protected]> wrote:
> Cuong, > > I guess the problem is not about your server, it is a error when running > GIZA. > GIZA is quite a stable tool. When this kind of problem happen you can > first go back to your training data and perform the following checks > 1. Is there any empty sentence pair? An empty sentence pair is a pair > which is empty on the source or target or both sides. > 2. Is there any exceptional long sentence pair? > > Nguyen > > On Mon, Nov 5, 2012 at 8:37 PM, Cuong Hoang <[email protected]>wrote: > >> Hi all, >> I use a server which is 130GB RAM and 24 cores. >> I have a wonder about the training data which I could use. >> >> In fact, I want to train an STM system from a very large bilingual corpus >> such as WMT 2010 (or NIST) to see what is the biggest BLEU score I could >> obtain (through I known that it also depends deeply from the test size). >> >> However, I usually obtain some unwanted errors in the MOSES's training. I >> have to truncate to obtain a smaller training corpus. If I do not truncate >> the size, I am usually stuck some errors such as: >> >> ERROR: Execution of: /home/cuongh/CODE/giza-pp/GIZA++ -CoocurrenceFile >> /home/cuongh/STATMT.BIG/giza.fr-en/fr-en.cooc -c >> /home/cuongh/STATMT.BIG/corpus/fr-en-int-train.snt -m1 5 -m2 3 -m3 3 -m4 0 >> -mh 0 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 >> -o /home/cuongh/STATMT.BIG/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s >> /home/cuongh/STATMT.BIG/corpus/en.vcb -t >> /home/cuongh/STATMT.BIG/corpus/fr.vcb >> * died with signal 11, with coredump* >> >> I just wonder that for a server is used like mine, what is the largest >> training data I could train? >> In addition, for trainining MOSES on a very large bilingual data, what >> are the recommends from the experts here would advice to me? >> >> I really need it. >> I love working on SMT but frankly, I'm now just a Master student, not a >> PhD. However, I will graduate soon. >> Tks, >> Best regards, >> C. Hoang >> -- >> Hoàng Cường >> SMTNerd >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > -- Hoàng Cường SMTNerd
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
