Hi all, This dictionary problem is finally solved. "-d" option works well. I made a silly mistake here and caused the problem. I converted the dictionary file to UTF8, but the coding of other files is:7bit ASCII characters. So sorry to bother you for such a long time... I really appreciate your kind help, especially Mark Fishel and Chris Dyer. You have helped this green hand a lot ;)
As I google this dictionary problem, all I found is my own question. So, to those who may use dictionary and don't know how, here's the advice: 1. well...make sure your texts of the same coding 2. check your giza++ source code, and find variable "useDict", make sure it's set to ture 3. add a "-d" option to your command, followed by your dictionary the dictionary should be in this format: target-word-id source-word-id it must be sorted by the target-word-id. here's my command line: (you may have to know those options which are set to 0 or 1, or a lot of files would be generated ) ./GIZA++ \ 5 -CoocurrenceFile korean-chinese.cooc \ 6 -c korean-chinese-int-train.snt \ 7 -m1 5 -m2 0 -mh 5 -m3 3 -m4 3 \ 8 -model1dumpfrequency 1 \ 9 -model2dumpfrequency 1 \ 10 -model345dumpfrequency 1 \ 11 -hmmdumpfrequency 1 \ 12 -model4smoothfactor 0.4 \ 13 -nbestalignments 1 \ 14 -onlyaldumps 0 \ 15 -nodumps 0 \ 16 -nsmooth 4 \ 17 -d ck.txt \ 18 -o korean-chinese \ 19 -onlyaldumps 1 \ 20 -p0 0.999 \ 21 -s chinese.vcb \ 22 -t korean.vcb 2009-12-23 Best regards, Lee Xianhua
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
