Tommaso Teofili created LUCENE-4927:
---------------------------------------

             Summary: Prevent underflow in NB classifier likelihood calculation
                 Key: LUCENE-4927
                 URL: https://issues.apache.org/jira/browse/LUCENE-4927
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/classification
    Affects Versions: 4.2
            Reporter: Tommaso Teofili
            Assignee: Tommaso Teofili
             Fix For: 5.0


Current likelihood calculation multiplies probabilities (whose values are 
between 0 and 1) thus having longish docs with unfrequent words for some 
class/category may lead to multiple _double_ multiplications to return 0 even 
if that's not the correct value (thus assigning such a class 0 probability too).

Probably using loglikelihood and/or _BigDecimals_ may help.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to