Hi Sreeja,

the error below is due to the fact that you are using a GIZA++ version  
compiled for use with coocurrence files without using such parameter.  
Typically, when you compile GIZA++ as part of the Moses toolkit,  
GIZA++ is compiled so as to use coocurrence files. This allows for  
better memory usage (and reduces computational time).

Hence, there are two ways to solve this problem:

1.- you compile GIZA++ without the coocurrence file option
2.- you provide a coocurrence file to the GIZA++ you have already compiled.

In order to produce a coocurrence file, you will need to use the  
following command:

snt2cooc.out <vcb1> <vcb2> <snt12>

In your case, this would look like this:

snt2cooc.out corp.en.vcb corp.ta.vcb corp.en_corp.ta.snt > corp.cooc

This will generate a coocurrence file, i.e. corp.cooc, will you will  
need to pass to GIZA++ as:

./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt  
-CoocurrenceFile corp.cooc

the snt2cooc.out binary is to be found in the GIZA++ compilation directory.

Cheers,

Germán Sanchis-Trilles



Quoting "sreeja B.P" <[email protected]>:

> sir,
>
> while running the Giza++ by the command
>
>  ./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt
>
>  we are getting as below :
>
>
>
>
>
>
> what is the coocurrence file ?
>
> how to rectify this problem and run Giza++ ?
>
>
>
>
> reading vocabulary files
> Source vocabulary list has 35497 unique tokens
> Target vocabulary list has 71683 unique tokens
> Calculating vocabulary frequencies from corpus corp.en_corp.ta.snt
> Reading more sentence pairs into memory ...
> Corpus fits in memory, corpus has: 14035 sentence pairs.
>  Train total # sentence pairs (weighted): 14035
> Size of source portion of the training corpus: 330148 tokens
> Size of the target portion of the training corpus: 262033 tokens
> In source portion of the training corpus, only 35496 unique tokens appeared
> In target portion of the training corpus, only 71681 unique tokens appeared
> lambda for PP calculation in IBM-1,IBM-2,HMM:= 262033/(344183-14035)==
> 0.793683
> ERROR: NO COOCURRENCE FILE GIVEN
> Aborted
>
>
>
>
> thank you
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to