Hi Barry, ./clean-corpus-n.perl in truck/scripts/training returned following error:-
./clean-corpus-n.perl corpus/* txt txt clean 1 50 clean-corpus.perl: processing corpus/200EnglishSens.txt.corpus/200HindiSens.txt & .txt to txt, cutoff clean-1 Use of uninitialized value $opn in open at ./clean-corpus-n.perl line 46. Use of uninitialized value $opn in concatenation (.) or string at ./clean-corpus-n.perl line 46. Can't open '' at ./clean-corpus-n.perl line 46. using train-factored-phrase-model.perl returned following error:- Using SCRIPTS_ROOTDIR: /home/nakul/mosesdecoder/trunk/scripts Using single-thread GIZA ERROR: Cannot find mkcls, GIZA++, & snt2cooc.out in . Did you install this script using 'make release'? at ./train-factored-phrase-model.perl line 205. it seems that moses does not recognize GIZA++ and mkcls. they are installed in different directories. i want to train them separately. is it possible to do so ? Regarding vcb file i got it by executing following command :- sudo ./plain2snt.out 200ESens.txt 200HSens.txt creates en.vcb, hn.vcb and bit text files (200ESens_200HSens.snt, 200HSens_200ESens.snt) in GIZA++ format. -- Thanks & Regards nakul. On Mon, Jan 31, 2011 at 3:54 PM, Barry Haddow <[email protected]> wrote: > Hi Nakul > > Clean corpus will get rid of long lines and lines with a high length ratio, > which giza doesn't like. This could fix your first error. > > Run ./clean-corpus-n,perl --help for usage instructions. > > As to the second error, if you're not using the moses scripts, how did you > create the vcb files? It looks as though they don't match the corpus, > > best regards - Barry > > On Monday 31 January 2011 10:17, nakul sharma wrote: > > Hi Barry, > > > > i am not training giza through moses. i am training it independently. > Will > > it make any difference ? Anyways i do not have clean-corpus-n.perl in > > giza. please tell what to do of it ? > > > > On Mon, Jan 31, 2011 at 3:07 PM, Barry Haddow <[email protected]> > wrote: > > > Hi Nakul > > > > > > Did you clean your corpus first (ie run clean-corpus-n.perl over it) ? > > > > > > best regards - Barry > > > > > > On Monday 31 January 2011 04:20, nakul sharma wrote: > > > > hi all, > > > > > > > > i have having g++ version 4.4.3 and ubuntu 10.04 LTS, while training > > > > GIZA++, i get following error upon execution of GIZA++ exe file:- > > > > > > > > Reading vocabulary file from:200ESens.vcb > > > > Reading vocabulary file from:200HSens.vcb > > > > {WARNING:(a)truncated sentence 0}{WARNING:(a)truncated sentence > > > > > > 1}WARNING: > > > > The following sentence pair has source/target sentence length ration > > > > more than the maximum allowed limit for a source word fertility > > > > source length = 1 target length = 11 ratio 11 ferility limit : 9 > > > > Shortening sentence > > > > Sent No: 3 , No. Occurrences: 1 > > > > 0 254 > > > > 57 5 3 58 59 60 5 61 62 63 64 > > > > > > > > like this for almost all the Sent No, i get this warning and then for > a > > > > sentence number 98 i get this error message:- > > > > > > > > Sent No: 98 , No. Occurrences: 1 > > > > 0 457 458 > > > > 909 910 15 911 17 86 912 913 65 3 914 915 22 916 11 917 170 162 918 > 919 > > > > 3 684 22 8 920 921 22 8 333 922 923 924 22 925 > > > > ERROR: target word 937 is not in the vocabulary list. > > > > > > > > Giza++ has generated only one file **.root.gfcs. > > > > > > > > Please tell how to deal with this problem. > > > > > > -- > > > The University of Edinburgh is a charitable body, registered in > > > Scotland, with registration number SC005336. > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > --
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
