Hi, Yes, we use berkeleyparsed2mosesxml.perl .
Typically these kinds of errors happen if the EMS was misconfigured. But I don't even know whether you used the EMS at all. You should try to find the command line for the execution of the phrase extraction binary in your logs. Then have a look at the parallel corpus and word alignment that were passed to it. If the data has annotation issues, then track them back to previous steps. If the data looks okay, then maybe we have a bug in the extractor. This is SAMT, not GHKM, right? We mostly use GHKM syntax in Edinburgh at the moment. However, Hieu should know more in case there has been a recent modification to SAMT grammar extraction. Cheers, Matthias On Mon, 2015-01-26 at 10:02 +0800, hxshi wrote: > > Thank you for your responses! > I found tokens in my model such as following: > > [X][HEAD] 高度 [X][POS] 建 > 设 [X] ||| [X][HEAD] label="HEAD"> <tree label="MOD"> last [X][POS] [TH] ||| > 0.00525464 0.122788 0.0475 1 ||| 1-0 2-1 3-5 4-4 ||| 0.244314 0.027027 > 0.027027 ||| ||| > > So, is that the reason? I used the script in moses > (berkeleyparsed2mosesxml.perl ) to change my parsing result to moses > format. Did you use that script? Or which script do you used for > format changing. > > > > ______________________________________________________________________ > Shi Huaxing > > > MI&T Lab > School of Computer Science and Technology > Harbin Institute of Technology > > > > From: Matthias Huck > Date: 2015-01-26 07:53 > To: hxshi > CC: moses-support > Subject: Re: Re: [Moses-support] I can't get any output from my > syntactic baseline. > Hi, > > I'm not fully sure but it looks to me like something's wrong about your > model. The tree annotation was probably flawed and you ended up aligning > and extracting annotation as proper words. Have a look at the rules in > your phrase table and the parallel corpora it was extracted from. Search > for tokens in the phrase table that shouldn't be in there (like "<tree" > and "</tree>"). > > An alternative explanation might be that the input that you're > translating got messed up with flawed annotation and those tokens are > passed through as unknowns. If you're doing string-to-tree, there's no > need to parse the source side, though. You're setting inputtype=3 which > doesn't seem to make much sense with regard to that fact (3 is tree > input, 0 is text input). > > Cheers, > Matthias > > > On Sun, 2015-01-25 at 10:07 +0800, hxshi wrote: > > Thank you for your advices. > > now it can translate something now. > > but when I run it , the translations are as following: > > > > > > Yili </tree> <tree <tree <tree label="COND"> <tree <tree propaganda > > activities > > label="AG"> Urumqi <tree label="PROP"> electricity -LRB- <tree 樊英 > > 利 </tree> 丁刚 </tree> 李秀 > > 芩 -RRB- label="TH"> <tree label="DET"> </tree> </tree> <tree <tree <tree > > label="VACTN"> > > <tree Yili creatively </tree> <tree <tree label="COND"> <tree <tree > > propaganda activities > > <tree label="PROP"> <tree <tree <tree </tree> label="PROP"> label="MOD"> > > voice accorded a warm welcome > > Located Xibei border </tree> Yili <tree </tree> </tree> <tree label="TH"> > > label="REL"> > > <tree <tree <tree <tree label="TH"> label="HEAD"> <tree label="HEAD"> <tree > > Yili </tree> > > <tree <tree </tree> <tree label="HEAD"> </tree> <tree > > > > > > they are not what I expected. What is the problem? how can I get the > > output as string. by the way, the out put even not a tree. > > > > ______________________________________________________________________ > > Shi Huaxing > > > > > > MI&T Lab > > School of Computer Science and Technology > > Harbin Institute of Technology > > > > > > > > From: Matthias Huck > > Date: 2015-01-25 04:04 > > To: hxshi > > CC: moses-support > > Subject: Re: [Moses-support] I can't get any output from my syntactic > > baseline. > > Hi, > > > > As Rico pointed out before: the glue rules are missing. > > > > Cheers, > > Matthias > > > > > > On Sun, 2015-01-25 at 03:25 +0800, hxshi wrote: > > > I can't get any output with my syntactic baseline. Will anybody know > > > what maybe wrong? > > > > > > I trained a string2tree baseline. Got a rule-table such like this: > > > > > > % [X][TH] 相当 > > > 于 [X][RA] [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.67586e-05 4.51106e-08 > > > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.735177 0.735177 ||| ||| > > > % [X][TH] 相当 > > > 于 [X][RA] 。 [X] ||| [X][TH] is [X][RA] [PROP] ||| 2.06243e-06 6.65709e-09 > > > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0904762 0.0904762 ||| ||| > > > % [X][TH] 相当 > > > 于 [X][RA] 于 [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.30259e-06 4.61662e-11 > > > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0571429 0.0571429 ||| ||| > > > > > > And I always got no output when I using this baseline. > > > for example : > > > > > > input 3 月 > > > > > > output on screen: > > > 3 月 > > > Translating line 2 in thread id 47362102691584 > > > Line 2: Initialize search took 0.000 seconds total > > > Translating: <s> 3 月 </s> ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) > > > [0,3]=X (1) [1,1]=X (1) [1,2]=X (1) [1,3]=X (1) [2,2]=X (1) [2,3]=X (1) > > > [3,3]=X (1) > > > > > > 0 1 2 3 > > > 0 8 8 0 > > > 0 29 0 > > > 0 0 > > > 0 > > > Line 2: Additional reporting took 0.000 seconds total > > > Line 2: Translation took 0.003 seconds total > > > Translation took 0.000 seconds > > > > > > Do you know what maybe wrong with my baseline? > > > > > > I run decoder with > > > moses_chart -T -f moses.ini > > > > > > trainning this baseline with: > > > train_model.pl > > > --glue-grammar --target-syntax -max-phrase-length=999 > > > --extract-options="--NonTermConsecSource --MinHoleSource 1 --MaxSpan 999 > > > --MinWords 0 --MaxNonTerm 3" -lm 0:5:lmsri.en --corpustrain_case --f zh > > > --e en -root-dir train_dir -external-bin-dir bin -mgiza -mgiza-cpus 6 > > > -cores 10 --alignment grow-diag-final-and -score-options ' > > > --GoodTuring' > > > > > > the moses.ini as following: > > > ######################### > > > > > > # input factors > > > [input-factors] > > > 0 > > > > > > # mapping steps > > > [mapping] > > > 0 T 0 > > > > > > [cube-pruning-pop-limit] > > > 1000 > > > > > > [non-terminals] > > > X > > > > > > [search-algorithm] > > > 3 > > > > > > [inputtype] > > > 3 > > > > > > [max-chart-span] > > > 20 > > > 1000 > > > > > > # feature functions > > > [feature] > > > UnknownWordPenalty > > > WordPenalty > > > PhrasePenalty > > > PhraseDictionaryMemory name=TranslationModel0 num-features=4 > > > path=/home/workspace/moses-fbis-case-s2t-ch2en/training_dir/model/rule-table.gz > > > input-factor=0 output-factor=0 > > > KENLM name=LM0 factor=0 path=/home/workspace/data-lm/lmsri.en order=5 > > > > > > # dense weights for feature functions > > > [weight] > > > UnknownWordPenalty0= 1 > > > WordPenalty0= -1 > > > PhrasePenalty0= 0.2 > > > TranslationModel0= 0.2 0.2 0.2 0.2 > > > LM0= 0.5 > > > > > > ______________________________________________________________________ > > > Shi Huaxing > > > > > > > > > MI&T Lab > > > School of Computer Science and Technology > > > Harbin Institute of Technology > > > > > > > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
