I think there is something wrong in the printing of sparse feature
parameters. I can reproduce it on two computers, about one error out of
ten batches. I checked the n-best output using attached script.
2017-04-12 22:25, Dingyuan Wang:
> I don't find anything wrong of this sentence in the test set. Other
> candidates of this sentence is good in the same batch of output. This
> problem occurs randomly (random sentence and candidate) during tuning.
>
> 2017-04-12 21:48, Hieu Hoang:
>> It looks like there is a phrase that is length 0, hence ' = 1'.
>>
>> Check your data has been cleaned and encoded correctly
>>
>> * Looking for MT/NLP opportunities *
>> Hieu Hoang
>> http://moses-smt.org/
>>
>>
>> On 12 April 2017 at 13:36, Dingyuan Wang <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Dear all,
>>
>> I come across the exactly same problem a year ago (follow the thread):
>>
>> https://www.mail-archive.com/[email protected]/msg13673.html
>> <https://www.mail-archive.com/[email protected]/msg13673.html>
>>
>> The moses constantly and randomly output corrupted best100 out that
>> crashes further kbmira tuning. Like:
>>
>> 45 ||| “ 愿 以 车 马 衣 裘 等 皆 与 朋 友 共 分 共 , 则 皆 敝 之 亦
>> 无 所 恨
>> 。 ” ||| LexicalReordering0= -6.1176 0 0 -6.58298 0 0 Distortion0= 0
>> LM0= -115.094 TWI_,= 0 SWD_OTHER= 2 WT_都~皆= 2 WT_OTHER~OTHER= 12
>> WT_也~OTHER= 1 WT_把~以= 1 WT_,~,= 1 WT_等~OTHER= 1 WT_OTHER~所= 1
>> WT_OTHER~无= 0 WT_OTHER~以= 1 WT_没有~无= 1 WT_了~之= 1 WT_”~”= 1
>> WT_。~。= 1
>> WT_也~而= 0 = 1 WT_OTHER~则= 1 WT_和~与= 1 WT_了~OTHER= 0 PL_t2= 5
>> PL_s2= 4
>> PL_1,2= 2 PL_3,4= 0 PL_s3= 1 WordPenalty0= -26 PhrasePenalty0= 21
>> TranslationModel0= -66.0904 -70.4587 -24.5341 -28.4086 ||| -15.012
>>
>> There is an error in "WT_也~而= 0 = 1". Then kbmira:
>>
>> kbmira with c=0.01 decay=0.999 no_shuffle=0
>> Initialising random seed from system clock
>> terminate called after throwing an instance of
>> 'MosesTuning::FileFormatException'
>> what(): Error in line "-6.1176 0 0 -6.58298 0 0 0 -115.094 1 -26 21
>> -66.0904 -70.4587 -24.5341 -28.4086 SWD_OTHER=2 WT_,~,=1
>> WT_OTHER~OTHER=12 PL_t2=5 PL_s3=1 PL_s2=4 PL_1,2=2 WT_”~”=1 WT_。~。=1
>> WT_没有~无=1 WT_了~之=1 WT_OTHER~以=1 WT_都~皆=2 WT_OTHER~所=1
>> WT_OTHER~则=1
>> WT_把~以=1 WT_和~与=1 WT_等~OTHER=1 WT_也~OTHER=1 " of run1.features.dat
>> Aborted (core dumped)
>>
>> System is Debian 9 (stretch/testing) with GCC 6.3.0, moses latest git
>> checkout.
>>
>> --
>> Dingyuan Wang
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>
>>
>
--
Dingyuan Wang
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
for k, ln in enumerate(sys.stdin, 1):
parts = [s.strip() for s in ln.split(' ||| ')]
for token in parts[2].split():
if token[-1] == '=':
assert len(token) > 1, '[%d] %s' % (k, ln.strip())
else:
float(token)
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support