I am trying to build a syntactic baselines. Using FBIS data as the training set. But the result what I got is a too small rule-table, even can not translate anything.
Baseline building steps are as following:
Training data: 234,348 lines for both Chinese side and English side of FBIS
data .
for example:
En:the effort against corruption has been intensified , echoing the
antismuggling campaign .
Zh: 反腐败 力度 加大 , 与 反走私 形成 呼应 之 势 。
I followed the guild line,
0 step : parsing the English tree with zpar
output such as:
(S (NP (NP (DT the) (NN effort)) (PP (IN against) (NP (NN corruption)))) (VP
(VBZ has) (VP (VBN been)
(VP (VBN intensified) (, ,) (S (VP (VBG echoing) (NP (DT the) (JJ
antismuggling) (NN campaign))))))) (. .))
1 step: wrapper the syntactic tree with
/moses/scripts/training/wrappers/berkeleyparsed2mosesxml.perl
output such as:
<tree label="S"> <tree label="NP"> <tree label="NP"> <tree label="DT"> the
</tree> <tree label="NN"> effort </tree>
</tree> <tree label="PP"> <tree label="IN"> against </tree> <tree label="NP">
<tree label="NN"> corruption </tree>
</tree> </tree> </tree> <tree label="VP"> <tree label="VBZ"> has </tree> <tree
label="VP"> <tree label="VBN"> been
</tree> <tree label="VP"> <tree label="VBN"> intensified </tree> <tree
label=","> , </tree> <tree label="S"> <tree label="VP">
<tree label="VBG"> echoing </tree> <tree label="NP"> <tree label="DT"> the
</tree> <tree label="JJ"> antismuggling </tree>
<tree label="NN"> campaign </tree> </tree> </tree> </tree> </tree> </tree>
</tree> <tree label="."> . </tree> </tree>
2 step: Train-model with following command
train-model.perl --source-syntax -max-phrase-length=999
--extract-options="--MaxSpan 999" -lm 0:5:${lm_dir}/lmsri.cn --corpus
${corpus_dir}/train_all --f en --e zh
-root-dir $train_dir -external-bin-dir /home/hxshi/moses/tools/bin -mgiza
-mgiza-cpus 6 -cores 10 --alignment grow-diag-final-and -score-options '
--GoodTuring'
what I got are:
234348 lines aligned.0.en
234348 lines aligned.0.zh
234348 lines aligned.grow-diag-final-and
3252 lines extract.inv.sorted.gz
3252 lines extract.sorted.gz
1724540 lines lex.e2f
1724540 lines lex.f2e
43 lines moses.ini
2935 lines rule-table.gz
3 step: Tuning with command :
mert-moses.pl --inputtype 3 $d_s $d_ref /home/hxshi/moses/tools/moses/bin/moses
$d_ini --working-dir ${tuning_dir} --batch-mira --return-best-dev
--decoder-flags " -threads 20 -v 0 " --rootdir
/home/hxshi/moses/tools/moses/scripts -mertdir
/home/hxshi/moses/tools/moses/bin --threads 20 --maximum-iterations 30
it stoped even in the first run
Enclose please find my moses.ini and my tuning output
I tried in both Tree2String (En2Ch) and String2Tree (Ch2En). The result almost
the same. Nothing can be translated.
Thank you for your patience of reading this mail. I am waiting for your
response urgently and sincerely !!
Shi Huaxing
MI&T Lab
School of Computer Science and Technology
Harbin Institute of Technology
log.turning
Description: Binary data
moses.ini
Description: Binary data
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
