--discount_fallback
On 05/31/15 20:13, Dingyuan Wang wrote:
> Dear all,
>
> When using lmplz to generate a 6-gram model for POS-tag-like data (small
> vocabulary, no real word), lmplz sometimes failed to run depending the
> dataset.
>
> The version of lmplz is built from the latest code from either
> mosesdecoder or kenlm GitHub repo.
>
> Command line is like
>
> somedir/lmplz -o 6 -S 50% --text foo.txt --arpa foo.lm
>
> Here is the stderr. The failing dataset is 2894562 lines, 92M.
>
> === 1/5 Counting and sorting n-grams ===
> Reading /home/gumble/[somedir]/zh-cn-nw-pos1.txt
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> ****************************************************************************************************
> Unigram tokens 41634632 types 114
> === 2/5 Calculating and sorting adjusted counts ===
> Chain sizes: 1:1368 2:241348592 3:452528608 4:724045824 5:1055900160
> 6:1448091648
> /home/gumble/github/kenlm/lm/builder/adjust_counts.cc:59 in void
> lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const
> lm::builder::DiscountConfig&) threw BadDiscountException because
> `discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'.
> ERROR: 1-gram discount out of range for adjusted count 3: -0.2
> Aborted (core dumped)
>
> I used `awk 'BEGIN {srand()} !/^$/ { if (rand() <= .0001) print }'
> zh-cn-nw-pos1.txt` to sample a few lines, which sometimes produces a
> similar error, or successfully runs.
>
> The attached failed sample produces the error "ERROR: 1-gram discount
> out of range for adjusted count 2: -1.23077".
>
> Thanks.
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support