Hi all,

This dictionary problem is finally solved. "-d" option works well.
I made a silly mistake here and caused the problem.
I converted the dictionary file to UTF8,
but the coding of other files is:7bit ASCII characters.
So sorry to bother you for such a long time...
I really appreciate your kind help, especially Mark Fishel and Chris Dyer.
You have helped this green hand a lot ;)

As I google this dictionary problem, all I found is my own question.
So, to those who may use dictionary and don't know how, here's the advice:
1. well...make sure your texts  of the same coding
2. check your giza++ source code, and find variable "useDict", make sure it's 
set to ture
3. add a "-d" option to your command, followed by your dictionary
the dictionary should be in this format:
target-word-id source-word-id
it must be sorted by the target-word-id.

here's my command line:
(you may have to know those options which are set to 0 or 1,  or a lot of files 
would be generated )

./GIZA++ \
  5 -CoocurrenceFile korean-chinese.cooc \
  6 -c korean-chinese-int-train.snt \
  7 -m1 5 -m2 0 -mh 5 -m3 3 -m4 3 \
  8 -model1dumpfrequency 1 \
  9 -model2dumpfrequency 1 \
 10 -model345dumpfrequency 1 \
 11 -hmmdumpfrequency 1 \
 12 -model4smoothfactor 0.4 \
 13 -nbestalignments 1 \
 14 -onlyaldumps 0 \
 15 -nodumps 0 \
 16 -nsmooth 4 \
 17 -d ck.txt \
 18 -o korean-chinese \
 19 -onlyaldumps 1 \
 20 -p0 0.999 \
 21 -s chinese.vcb \
 22 -t korean.vcb





2009-12-23 



Best regards,

Lee Xianhua
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to