[jira] Created: (MAHOUT-487) Issues with memory use and inconsistent or state-influenced results when using CBayesAlgorithm

Drew Farris (JIRA) Tue, 24 Aug 2010 05:59:00 -0700

Issues with memory use and inconsistent or state-influenced results when using 
CBayesAlgorithm
----------------------------------------------------------------------------------------------


                 Key: MAHOUT-487
                 URL: https://issues.apache.org/jira/browse/MAHOUT-487
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.3
            Reporter: Drew Farris
            Priority: Minor


Came across this digging through the mailing list archives for something else, 
probably worth tracking as an issue.

{quote}
During classification, every word still unknown is added to 
featureDictionary. This leads to the excessive growth if lots of texts 
with unknown words are to be classified. The inconsistency is caused by 
using a "vocabCount" that is not reset after each classification. 
Indeed, featureDictionary.size() is used for "vocabCount", which 
increases every time new unknown words are discovered.
{quote}

See: 
http://www.lucidimagination.com/search/document/7dabe3efec8d136d/issues_with_memory_use_and_inconsistent_or_state_influenced_results_when_using_cbayesalgorit#8853165db260bf75

Alternately per Robin:

{quote}
We can remove the addition features to the
dictionary altogether. Will yield better performance, and lock down the
model. Will require a bit more modification
{quote]


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-487) Issues with memory use and inconsistent or state-influenced results when using CBayesAlgorithm

Reply via email to