Issues with memory use and inconsistent or state-influenced results when using
CBayesAlgorithm
----------------------------------------------------------------------------------------------
Key: MAHOUT-487
URL: https://issues.apache.org/jira/browse/MAHOUT-487
Project: Mahout
Issue Type: Bug
Components: Classification
Affects Versions: 0.3
Reporter: Drew Farris
Priority: Minor
Came across this digging through the mailing list archives for something else,
probably worth tracking as an issue.
{quote}
During classification, every word still unknown is added to
featureDictionary. This leads to the excessive growth if lots of texts
with unknown words are to be classified. The inconsistency is caused by
using a "vocabCount" that is not reset after each classification.
Indeed, featureDictionary.size() is used for "vocabCount", which
increases every time new unknown words are discovered.
{quote}
See:
http://www.lucidimagination.com/search/document/7dabe3efec8d136d/issues_with_memory_use_and_inconsistent_or_state_influenced_results_when_using_cbayesalgorit#8853165db260bf75
Alternately per Robin:
{quote}
We can remove the addition features to the
dictionary altogether. Will yield better performance, and lock down the
model. Will require a bit more modification
{quote]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.