Hi Dingyuan
The extractor expects feature names to contain an underscore (not sure
exactly why) but some of yours don't, and Moses skips them, interpreting
their values as extra dense features.
The attached screenshot shows my view of the offending names. The ones
starting with the "@" are the problem. So it does look like the nbest
list is corrupted. Can you run the decoder on just that sentence, to
create an uncompressed version of the nbest list?
cheers - Barry
On 18/01/16 12:02, Dingyuan Wang wrote:
Hi Barry,
Attached is the zgrep result.
I found that in the middle of line 61 a few bytes are corrupted. Is that
a moses problem or my memory has a problem?
I also checked other files using iconv, they are all OK in UTF-8.
在 2016年01月18日 19:32, Barry Haddow 写道:
Hi Dingyuan
Yes, that's very possible. The error could be in extracting features.dat
from the nbest list. Are you able to post the nbest list? Or at least
the entries for sentence 16?
Run something like
zgrep "^16 " tuning/tmp.1/run7.best100.out.gz
cheers - Barry
On 18/01/16 11:24, Dingyuan Wang wrote:
Hi Barry,
I have rerun the ems after the first email, and then posted the recent
results, so the line changed.
I just use the latest code, and the EMS script. Pretty much are default
settings. The EMS setting is:
sparse-features = "target-word-insertion top 50, source-word-deletion
top 50, word-translation top 50 50, phrase-length"
I suspect there is something unexpected in the extractor.
在 2016年01月18日 19:03, Barry Haddow 写道:
Hi Dingyuan
In fact it is not the sparse features nor the Asian characters that are
the problem. The offending line has 17 dense features, yet your model
has 14 dense features.
The string "1 1 1" appears directly after the language model feature in
line 1694, in your attachment, adding the extra 3 features. Note that
this is not the line you mentioned in your earlier email.
I have no idea why there are extra features. Have you made changes to
any of the core Moses features?
best wishes
Barry
The offending line:
what(): Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39
18 -26.2331 -40.6736 -44.3698 -82.5072 WT_,~,=3 WT_:~:=1 WT_“~“=1
WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5
PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1
WT_!~!=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 WT_而~
却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 WT_者~
的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_誓~发
誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 WT_中
原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中流=1 WT_
流~中流=1 WT_部曲~部下=1 " of ...
On 18/01/16 10:37, Dingyuan Wang wrote:
Hi,
I've attached that. The line number is 1694.
在 2016年01月18日 16:43, Barry Haddow 写道:
Hi Dingyuan
Is it possible to attach the features.dat file that is causing the
error? Almost certainly Moses is failing to parse the line because of
the Asian characters in the feature names,
cheers - Barry
On 16/01/16 15:58, Dingyuan Wang wrote:
I ran
~/software/moses/bin/kbmira -J 75 --dense-init run7.dense
--sparse-init
run7.sparse-weights --ffile run1.features.dat --ffile
run2.features.dat
--ffile run3.features.dat --ffile run4.features.dat --ffile
run5.features.dat --ffile run6.features.dat --ffile run7.features.dat
--scfile run1.scores.dat --scfile run2.scores.dat --scfile
run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat
--scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out
in the tuning/tmp.1 directory, which will certainly replicate the
error.
在 2016年01月16日 23:42, Hieu Hoang 写道:
The mert script prints out every command it runs. You should be
able to
replicate the error by running the last command
On 16 Jan 2016 14:18, "Dingyuan Wang" <[email protected]
<mailto:[email protected]>> wrote:
Sorry, but I can't reliably replicate the same problem when
running
TUNING_tune.1 alone. There is no character '_' in the test
set
or top50
list.
I'm using sparse-features = "target-word-insertion top 50,
source-word-deletion top 50, word-translation top 50 50,
phrase-length"
I've attached some related files from EMS and the EMS config.
https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
在 2016年01月16日 02:45, Hieu Hoang 写道:
> could you make your model files available for download so I
can
> replicate this problem.
>
> it seems like you're using a feature function with sparse
scores. I
> think the character '_' must be escaped.
>
>
> On 12/01/16 04:00, Dingyuan Wang wrote:
>> Hi all,
>>
>> I'm using EMS for doing experiments. Every time the kbmira
died with
>> SIGABRT when turning on one direction, while tuning on the
opposite
>> direction (same config and test set) was successful.
>>
>> The mert.log (stderr) shows follows:
>>
>>
>> kbmira with c=0.01 decay=0.999 no_shuffle=0
>> Initialising random seed from system clock
>> Found 15323 initial sparse features
>> ....terminate called after throwing an instance of
>> 'MosesTuning::FileFormatException'
>> what(): Error in line "-4.51933 0 0 -6.09733 0 0 0
-121.556 2
-20 12
>> -31.6201 -38.5211 -26.5112 -60.6166 WT_,~,=2 WT_?~?=1
PL_s1=4
>> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3 PL_t1=6
PL_t2=4
PL_t3=2
>> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_未~没有=1 WT_何~怎么=1 WT_何~
能=1
WT_方~正
>> 在=1 WT_又~还=1 WT_君~您=2 WT_趣~向=1 WT_趣~奔=1 WT_有~
没有=1
WT_
往~去=1
>> WT_官~官员=1 WT_假~借=1 WT_檄~檄文=1 WT_文~文告=1 WT_上~上
级=1 WT_为~
>> 呢=1 WT_在~正在=1 " of run7.features.dat
>> Aborted
>>
>>
>> I think since run7.scores.dat is generated by some
scripts, I
wouldn't
>> be responsible for making the bad format. Last time it
also
died, I
>> removed the likely offending line in the test set, but
this time
another
>> line appears.
>>
>> --
>> Dingyuan Wang
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Dingyuan Wang (gumblex)
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support