Hi Sreeja, the error below is due to the fact that you are using a GIZA++ version compiled for use with coocurrence files without using such parameter. Typically, when you compile GIZA++ as part of the Moses toolkit, GIZA++ is compiled so as to use coocurrence files. This allows for better memory usage (and reduces computational time).
Hence, there are two ways to solve this problem: 1.- you compile GIZA++ without the coocurrence file option 2.- you provide a coocurrence file to the GIZA++ you have already compiled. In order to produce a coocurrence file, you will need to use the following command: snt2cooc.out <vcb1> <vcb2> <snt12> In your case, this would look like this: snt2cooc.out corp.en.vcb corp.ta.vcb corp.en_corp.ta.snt > corp.cooc This will generate a coocurrence file, i.e. corp.cooc, will you will need to pass to GIZA++ as: ./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt -CoocurrenceFile corp.cooc the snt2cooc.out binary is to be found in the GIZA++ compilation directory. Cheers, Germán Sanchis-Trilles Quoting "sreeja B.P" <[email protected]>: > sir, > > while running the Giza++ by the command > > ./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt > > we are getting as below : > > > > > > > what is the coocurrence file ? > > how to rectify this problem and run Giza++ ? > > > > > reading vocabulary files > Source vocabulary list has 35497 unique tokens > Target vocabulary list has 71683 unique tokens > Calculating vocabulary frequencies from corpus corp.en_corp.ta.snt > Reading more sentence pairs into memory ... > Corpus fits in memory, corpus has: 14035 sentence pairs. > Train total # sentence pairs (weighted): 14035 > Size of source portion of the training corpus: 330148 tokens > Size of the target portion of the training corpus: 262033 tokens > In source portion of the training corpus, only 35496 unique tokens appeared > In target portion of the training corpus, only 71681 unique tokens appeared > lambda for PP calculation in IBM-1,IBM-2,HMM:= 262033/(344183-14035)== > 0.793683 > ERROR: NO COOCURRENCE FILE GIVEN > Aborted > > > > > thank you > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
