That paper answered my questions, thank you Ted.
I'll rework the patch a little to use variable names more consistent
with the paper and I think my colleague was right when he suspected a
tiny bug that only occurs when one uses a smoothing parameter different
from one.
On 29.06.2011 00:03, Ted Dunning wrote
Hmmm... not sure. I thought they were all the same. It is possible
there is a left-over implementation.
Robin? Care to comment?
On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:
Is org.apache.mahout.classifier.__naivebayes also based on that one?
I thought it was only relevant for org.apache.mahout.classifier.__bayes?
On 28.06.2011 23:58, Ted Dunning wrote:
See here:
http://citeseerx.ist.psu.edu/__viewdoc/summary?doi=10.1.1.13.__8572&rank=1
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>
On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA)
<[email protected] <mailto:[email protected]>>wrote:
[
https://issues.apache.org/__jira/browse/MAHOUT-746?page=__com.atlassian.jira.plugin.__system.issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>]
Sebastian Schelter commented on MAHOUT-746:
------------------------------__-------------
Thank you very much, Sean.
I wonder whether there is some article/paper that describes
this particular
approach of implementing Naive Bayes? A colleague of mine
with a much deeper
statistics background and me took a look at the details of
the computation
today and we were left with some open questions.
Refactoring of the parallel Naive Bayes implementation in
org.apache.mahout.classifier.__naivebayes
------------------------------__------------------------------__------------------------------__-------
Key: MAHOUT-746
URL:
https://issues.apache.org/__jira/browse/MAHOUT-746
<https://issues.apache.org/jira/browse/MAHOUT-746>
Project: Mahout
Issue Type: Improvement
Components: Classification
Affects Versions: 0.6
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Fix For: 0.6
Attachments: MAHOUT-746.patch
I refactored the code in
org.apache.mahout.classifier.__naivebayes to
extend AbstractJob, decoupled the model serialization from
the job output,
extracted trainer classes and tried to clarify naming and
reduce code
complexity. I also added tests for the training M/R code as
well as a toy
integration test.
It would be great if someone could review my patch to
make sure I didn't
break anything.
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/__software/jira
<http://www.atlassian.com/software/jira>