Hi Nakul Clean corpus will get rid of long lines and lines with a high length ratio, which giza doesn't like. This could fix your first error.
Run ./clean-corpus-n,perl --help for usage instructions. As to the second error, if you're not using the moses scripts, how did you create the vcb files? It looks as though they don't match the corpus, best regards - Barry On Monday 31 January 2011 10:17, nakul sharma wrote: > Hi Barry, > > i am not training giza through moses. i am training it independently. Will > it make any difference ? Anyways i do not have clean-corpus-n.perl in > giza. please tell what to do of it ? > > On Mon, Jan 31, 2011 at 3:07 PM, Barry Haddow <[email protected]> wrote: > > Hi Nakul > > > > Did you clean your corpus first (ie run clean-corpus-n.perl over it) ? > > > > best regards - Barry > > > > On Monday 31 January 2011 04:20, nakul sharma wrote: > > > hi all, > > > > > > i have having g++ version 4.4.3 and ubuntu 10.04 LTS, while training > > > GIZA++, i get following error upon execution of GIZA++ exe file:- > > > > > > Reading vocabulary file from:200ESens.vcb > > > Reading vocabulary file from:200HSens.vcb > > > {WARNING:(a)truncated sentence 0}{WARNING:(a)truncated sentence > > > > 1}WARNING: > > > The following sentence pair has source/target sentence length ration > > > more than the maximum allowed limit for a source word fertility > > > source length = 1 target length = 11 ratio 11 ferility limit : 9 > > > Shortening sentence > > > Sent No: 3 , No. Occurrences: 1 > > > 0 254 > > > 57 5 3 58 59 60 5 61 62 63 64 > > > > > > like this for almost all the Sent No, i get this warning and then for a > > > sentence number 98 i get this error message:- > > > > > > Sent No: 98 , No. Occurrences: 1 > > > 0 457 458 > > > 909 910 15 911 17 86 912 913 65 3 914 915 22 916 11 917 170 162 918 919 > > > 3 684 22 8 920 921 22 8 333 922 923 924 22 925 > > > ERROR: target word 937 is not in the vocabulary list. > > > > > > Giza++ has generated only one file **.root.gfcs. > > > > > > Please tell how to deal with this problem. > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
