Hi,

        Your command looks reasonable.  There are some differences in what 
we're calculating.

MITLM says "Instead of estimating discount factors f [i] = D(i) from
count statistics, we can also tune them to minimize the development set 
perplexity, which has been observed to improve performance [1]."  This 
is in fact better than what lmplz does.

IRSTLM isn't computing canonical modified Kneser-Ney.  Your data seems 
close to passing (it passed for orders 1-4), so my guess would be that 
they're using a different formula (likely) or do not check for this 
condition.

SRILM should encounter an error on this data.  If it doesn't, then I'll 
dig into the code.

        If you pipe the data through sort -u, how much smaller does it get and 
does that fix the issue?  Does it contain e.g. long product names that 
might mess up 5-gram statistics?

        I guess the reasonable thing to do in this situation would be to turn 
off the "modified" part of modified Kneser-Ney.  Instead of separate 
discounts for 1, 2, and 3+, it would make one discount for all counts. 
At least for 5-grams in your case.  I could implement this if you want.

        Also, you're running a 64-bit machine, yes?

Kenneth

On 02/21/13 07:05, Darragh Whelan wrote:
> Hi everyone,
>
> I am trying to build a language model using lmplz but I am running into
> some problems.
>
> The error is:
>
>
> /adjust_counts.cc:50 in void
> lm::builder::<unnamed>::StatCollector::CalculateDiscounts() threw
> BadDiscountException because `discounts_[i].amount[j] < 0.0 ||
> discounts_[i].amount[j] > j'./
>
> /ERROR: 5-gram discount out of range for adjusted count 2: -5.03664/
>
> /Aborted/
>
> //
>
> I have followed the Moses manual closely and the command I used to run was:
>
> /bin/lmplz -o 5 -S 80% -T  /tmp/ <  /data/lm.fr >
>   /engines/FR_FR/lm/lmplz.arpa
>
> I don’t think it is a problem with our data as I have been able to build
> a language model using IRSTLM and MITLM successfully.
>
> Could anyone please help us with getting lmplz to work with this data?
>
> Thanks,
>
> Darragh
>
> --
> Oracle <http://www.oracle.com/>
> Darragh Whelan | Software Engineer | +353 180 31922
>
> Oracle- WPTG
>
> East Point Business Park
>
> Clontarf
>
> Dublin
>
> Ireland
>
> Green Oracle <http://www.oracle.com/commitment>
>
>       
>
> Oracle is committed to developing practices and products that help
> protect the environment
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to