Hi Barry, Attached is the zgrep result. I found that in the middle of line 61 a few bytes are corrupted. Is that a moses problem or my memory has a problem?
I also checked other files using iconv, they are all OK in UTF-8. 在 2016年01月18日 19:32, Barry Haddow 写道: > Hi Dingyuan > > Yes, that's very possible. The error could be in extracting features.dat > from the nbest list. Are you able to post the nbest list? Or at least > the entries for sentence 16? > > Run something like > > zgrep "^16 " tuning/tmp.1/run7.best100.out.gz > > cheers - Barry > > On 18/01/16 11:24, Dingyuan Wang wrote: >> Hi Barry, >> >> I have rerun the ems after the first email, and then posted the recent >> results, so the line changed. >> >> I just use the latest code, and the EMS script. Pretty much are default >> settings. The EMS setting is: >> >> sparse-features = "target-word-insertion top 50, source-word-deletion >> top 50, word-translation top 50 50, phrase-length" >> >> I suspect there is something unexpected in the extractor. >> >> >> 在 2016年01月18日 19:03, Barry Haddow 写道: >>> Hi Dingyuan >>> >>> In fact it is not the sparse features nor the Asian characters that are >>> the problem. The offending line has 17 dense features, yet your model >>> has 14 dense features. >>> >>> The string "1 1 1" appears directly after the language model feature in >>> line 1694, in your attachment, adding the extra 3 features. Note that >>> this is not the line you mentioned in your earlier email. >>> >>> I have no idea why there are extra features. Have you made changes to >>> any of the core Moses features? >>> >>> best wishes >>> Barry >>> >>> The offending line: >>> what(): Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39 >>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_,~,=3 WT_:~:=1 WT_“~“=1 >>> WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5 >>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1 >>> WT_!~!=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 WT_而~ >>> 却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 WT_者~ >>> 的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_誓~发 >>> 誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 WT_中 >>> 原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中流=1 WT_ >>> 流~中流=1 WT_部曲~部下=1 " of ... >>> >>> >>> On 18/01/16 10:37, Dingyuan Wang wrote: >>>> Hi, >>>> >>>> I've attached that. The line number is 1694. >>>> >>>> 在 2016年01月18日 16:43, Barry Haddow 写道: >>>>> Hi Dingyuan >>>>> >>>>> Is it possible to attach the features.dat file that is causing the >>>>> error? Almost certainly Moses is failing to parse the line because of >>>>> the Asian characters in the feature names, >>>>> >>>>> cheers - Barry >>>>> >>>>> On 16/01/16 15:58, Dingyuan Wang wrote: >>>>>> I ran >>>>>> >>>>>> ~/software/moses/bin/kbmira -J 75 --dense-init run7.dense >>>>>> --sparse-init >>>>>> run7.sparse-weights --ffile run1.features.dat --ffile >>>>>> run2.features.dat >>>>>> --ffile run3.features.dat --ffile run4.features.dat --ffile >>>>>> run5.features.dat --ffile run6.features.dat --ffile run7.features.dat >>>>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile >>>>>> run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat >>>>>> --scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out >>>>>> >>>>>> in the tuning/tmp.1 directory, which will certainly replicate the >>>>>> error. >>>>>> >>>>>> 在 2016年01月16日 23:42, Hieu Hoang 写道: >>>>>>> The mert script prints out every command it runs. You should be >>>>>>> able to >>>>>>> replicate the error by running the last command >>>>>>> >>>>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Sorry, but I can't reliably replicate the same problem when >>>>>>> running >>>>>>> TUNING_tune.1 alone. There is no character '_' in the test >>>>>>> set >>>>>>> or top50 >>>>>>> list. >>>>>>> >>>>>>> I'm using sparse-features = "target-word-insertion top 50, >>>>>>> source-word-deletion top 50, word-translation top 50 50, >>>>>>> phrase-length" >>>>>>> >>>>>>> I've attached some related files from EMS and the EMS config. >>>>>>> >>>>>>> >>>>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4 >>>>>>> >>>>>>> >>>>>>> 在 2016年01月16日 02:45, Hieu Hoang 写道: >>>>>>> > could you make your model files available for download so I >>>>>>> can >>>>>>> > replicate this problem. >>>>>>> > >>>>>>> > it seems like you're using a feature function with sparse >>>>>>> scores. I >>>>>>> > think the character '_' must be escaped. >>>>>>> > >>>>>>> > >>>>>>> > On 12/01/16 04:00, Dingyuan Wang wrote: >>>>>>> >> Hi all, >>>>>>> >> >>>>>>> >> I'm using EMS for doing experiments. Every time the kbmira >>>>>>> died with >>>>>>> >> SIGABRT when turning on one direction, while tuning on the >>>>>>> opposite >>>>>>> >> direction (same config and test set) was successful. >>>>>>> >> >>>>>>> >> The mert.log (stderr) shows follows: >>>>>>> >> >>>>>>> >> >>>>>>> >> kbmira with c=0.01 decay=0.999 no_shuffle=0 >>>>>>> >> Initialising random seed from system clock >>>>>>> >> Found 15323 initial sparse features >>>>>>> >> ....terminate called after throwing an instance of >>>>>>> >> 'MosesTuning::FileFormatException' >>>>>>> >> what(): Error in line "-4.51933 0 0 -6.09733 0 0 0 >>>>>>> -121.556 2 >>>>>>> -20 12 >>>>>>> >> -31.6201 -38.5211 -26.5112 -60.6166 WT_,~,=2 WT_?~?=1 >>>>>>> PL_s1=4 >>>>>>> >> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3 PL_t1=6 >>>>>>> PL_t2=4 >>>>>>> PL_t3=2 >>>>>>> >> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_未~没有=1 WT_何~怎么=1 WT_何~ >>>>>>> 能=1 >>>>>>> WT_方~正 >>>>>>> >> 在=1 WT_又~还=1 WT_君~您=2 WT_趣~向=1 WT_趣~奔=1 WT_有~ >>>>>>> 没有=1 >>>>>>> WT_ >>>>>>> 往~去=1 >>>>>>> >> WT_官~官员=1 WT_假~借=1 WT_檄~檄文=1 WT_文~文告=1 WT_上~上 >>>>>>> 级=1 WT_为~ >>>>>>> >> 呢=1 WT_在~正在=1 " of run7.features.dat >>>>>>> >> Aborted >>>>>>> >> >>>>>>> >> >>>>>>> >> I think since run7.scores.dat is generated by some >>>>>>> scripts, I >>>>>>> wouldn't >>>>>>> >> be responsible for making the bad format. Last time it >>>>>>> also >>>>>>> died, I >>>>>>> >> removed the likely offending line in the test set, but >>>>>>> this time >>>>>>> another >>>>>>> >> line appears. >>>>>>> >> >>>>>>> >> -- >>>>>>> >> Dingyuan Wang >>>>>>> >> _______________________________________________ >>>>>>> >> Moses-support mailing list >>>>>>> >> [email protected] <mailto:[email protected]> >>>>>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> > >>>>>>> >>>>>>> -- >>>>>>> Dingyuan Wang (gumblex) >>>>>>> >>> > > -- Dingyuan Wang (gumblex)
16-run7.best100.out.gz
Description: application/gzip
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
