Tommaso Teofili created LUCENE-4927:
---------------------------------------
Summary: Prevent underflow in NB classifier likelihood calculation
Key: LUCENE-4927
URL: https://issues.apache.org/jira/browse/LUCENE-4927
Project: Lucene - Core
Issue Type: Bug
Components: modules/classification
Affects Versions: 4.2
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Fix For: 5.0
Current likelihood calculation multiplies probabilities (whose values are
between 0 and 1) thus having longish docs with unfrequent words for some
class/category may lead to multiple _double_ multiplications to return 0 even
if that's not the correct value (thus assigning such a class 0 probability too).
Probably using loglikelihood and/or _BigDecimals_ may help.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]