sir, while running the Giza++ by the command
./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt we are getting as below : what is the coocurrence file ? how to rectify this problem and run Giza++ ? reading vocabulary files Source vocabulary list has 35497 unique tokens Target vocabulary list has 71683 unique tokens Calculating vocabulary frequencies from corpus corp.en_corp.ta.snt Reading more sentence pairs into memory ... Corpus fits in memory, corpus has: 14035 sentence pairs. Train total # sentence pairs (weighted): 14035 Size of source portion of the training corpus: 330148 tokens Size of the target portion of the training corpus: 262033 tokens In source portion of the training corpus, only 35496 unique tokens appeared In target portion of the training corpus, only 71681 unique tokens appeared lambda for PP calculation in IBM-1,IBM-2,HMM:= 262033/(344183-14035)== 0.793683 ERROR: NO COOCURRENCE FILE GIVEN Aborted thank you
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
