sir,

while running the Giza++ by the command

 ./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt

 we are getting as below :






what is the coocurrence file ?

how to rectify this problem and run Giza++ ?




reading vocabulary files
Source vocabulary list has 35497 unique tokens
Target vocabulary list has 71683 unique tokens
Calculating vocabulary frequencies from corpus corp.en_corp.ta.snt
Reading more sentence pairs into memory ...
Corpus fits in memory, corpus has: 14035 sentence pairs.
 Train total # sentence pairs (weighted): 14035
Size of source portion of the training corpus: 330148 tokens
Size of the target portion of the training corpus: 262033 tokens
In source portion of the training corpus, only 35496 unique tokens appeared
In target portion of the training corpus, only 71681 unique tokens appeared
lambda for PP calculation in IBM-1,IBM-2,HMM:= 262033/(344183-14035)==
0.793683
ERROR: NO COOCURRENCE FILE GIVEN
Aborted




thank you
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to