I am trying to build a syntactic baselines. Using FBIS data as the training 
set. 
But the result what I got is a too small rule-table, even can not translate 
anything.

Baseline building steps are as following:

Training data: 234,348 lines for both Chinese side and English side of FBIS 
data .
for example:
En:the effort against corruption has been intensified , echoing the 
antismuggling campaign . 
Zh: 反腐败 力度 加大 , 与 反走私 形成 呼应 之 势 。 
I followed the guild line,
0 step : parsing the English tree with zpar 
output such as:
(S (NP (NP (DT the) (NN effort)) (PP (IN against) (NP (NN corruption)))) (VP 
(VBZ has) (VP (VBN been) 
(VP (VBN intensified) (, ,) (S (VP (VBG echoing) (NP (DT the) (JJ 
antismuggling) (NN campaign))))))) (. .))  
1 step:  wrapper the syntactic tree with 
/moses/scripts/training/wrappers/berkeleyparsed2mosesxml.perl
output such as:
<tree label="S"> <tree label="NP"> <tree label="NP"> <tree label="DT"> the 
</tree> <tree label="NN"> effort </tree> 
</tree> <tree label="PP"> <tree label="IN"> against </tree> <tree label="NP"> 
<tree label="NN"> corruption </tree> 
</tree> </tree> </tree> <tree label="VP"> <tree label="VBZ"> has </tree> <tree 
label="VP"> <tree label="VBN"> been 
</tree> <tree label="VP"> <tree label="VBN"> intensified </tree> <tree 
label=","> , </tree> <tree label="S"> <tree label="VP">
 <tree label="VBG"> echoing </tree> <tree label="NP"> <tree label="DT"> the 
</tree> <tree label="JJ"> antismuggling </tree> 
<tree label="NN"> campaign </tree> </tree> </tree> </tree> </tree> </tree> 
</tree> <tree label="."> . </tree> </tree> 
2 step: Train-model with following command
train-model.perl --source-syntax -max-phrase-length=999 
--extract-options="--MaxSpan 999" -lm 0:5:${lm_dir}/lmsri.cn --corpus 
${corpus_dir}/train_all --f en --e zh
 -root-dir $train_dir -external-bin-dir /home/hxshi/moses/tools/bin -mgiza 
-mgiza-cpus 6   -cores 10    --alignment grow-diag-final-and -score-options ' 
--GoodTuring'
what I got are:
 234348  lines          aligned.0.en
   234348   lines      aligned.0.zh
   234348   lines       aligned.grow-diag-final-and
      3252  lines       extract.inv.sorted.gz
      3252  lines      extract.sorted.gz
  1724540   lines      lex.e2f
  1724540  lines        lex.f2e
       43      lines          moses.ini
      2935 lines           rule-table.gz

3 step: Tuning with command :
mert-moses.pl --inputtype 3 $d_s $d_ref /home/hxshi/moses/tools/moses/bin/moses 
$d_ini --working-dir ${tuning_dir} --batch-mira --return-best-dev 
 --decoder-flags "     -threads 20  -v 0 " --rootdir 
/home/hxshi/moses/tools/moses/scripts -mertdir 
/home/hxshi/moses/tools/moses/bin --threads 20 --maximum-iterations 30 
it stoped even in the first run

Enclose please find my moses.ini and my tuning output
I tried in both Tree2String (En2Ch) and String2Tree (Ch2En). The result almost 
the same. Nothing  can be translated.
Thank you for your patience of reading this mail.  I am waiting for your 
response  urgently and sincerely !!  


Shi Huaxing
MI&T Lab
School of Computer Science and Technology
Harbin Institute of Technology

Attachment: log.turning
Description: Binary data

Attachment: moses.ini
Description: Binary data

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to