Hi Robin,
we already figured out the math. It would be great if you could do a
short proof-read of the changes the refactoring introduced.
--sebastian
On 04.07.2011 09:06, Robin Anil wrote:
On Wed, Jun 29, 2011 at 3:33 AM, Ted Dunning <[email protected]
<mailto:[email protected]>> wrote:
Hmmm... not sure. I thought they were all the same. It is possible
there
is a left-over implementation.
Robin? Care to comment?
Didnt see the thread. Both are based on same math. naivebayes one uses
vectors instead of text
On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:
> Is org.apache.mahout.classifier.**naivebayes also based on that
one? I
> thought it was only relevant for
org.apache.mahout.classifier.**bayes?
>
>
> On 28.06.2011 23:58, Ted Dunning wrote:
>
>> See here:
>> http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**
>>
8572&rank=1<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>>
>>
>> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA)
>> <[email protected] <mailto:[email protected]>>wrote:
>>
>>
>>> [
>>> https://issues.apache.org/**jira/browse/MAHOUT-746?page=**
>>> com.atlassian.jira.plugin.**system.issuetabpanels:comment-**
>>>
tabpanel&focusedCommentId=**13056805#comment-13056805<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>>
>>> ]
>>>
>>> Sebastian Schelter commented on MAHOUT-746:
>>> ------------------------------**-------------
>>>
>>> Thank you very much, Sean.
>>>
>>> I wonder whether there is some article/paper that describes this
>>> particular
>>> approach of implementing Naive Bayes? A colleague of mine with
a much
>>> deeper
>>> statistics background and me took a look at the details of the
>>> computation
>>> today and we were left with some open questions.
>>>
>>> Refactoring of the parallel Naive Bayes implementation in
>>>>
>>> org.apache.mahout.classifier.**naivebayes
>>>
>>>>
>>>> ------------------------------**------------------------------**
>>> ------------------------------**-------
>>>
>>>>
>>>> Key: MAHOUT-746
>>>> URL:
https://issues.apache.org/**jira/browse/MAHOUT-746<https://issues.apache.org/jira/browse/MAHOUT-746>
>>>> Project: Mahout
>>>> Issue Type: Improvement
>>>> Components: Classification
>>>> Affects Versions: 0.6
>>>> Reporter: Sebastian Schelter
>>>> Assignee: Sebastian Schelter
>>>> Fix For: 0.6
>>>>
>>>> Attachments: MAHOUT-746.patch
>>>>
>>>>
>>>> I refactored the code in
org.apache.mahout.classifier.**naivebayes to
>>>>
>>> extend AbstractJob, decoupled the model serialization from the job
>>> output,
>>> extracted trainer classes and tried to clarify naming and
reduce code
>>> complexity. I also added tests for the training M/R code as
well as a toy
>>> integration test.
>>>
>>>> It would be great if someone could review my patch to make
sure I didn't
>>>>
>>> break anything.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> For more information on JIRA, see: http://www.atlassian.com/**
>>> software/jira <http://www.atlassian.com/software/jira>
>>>
>>>
>>>
>>>
>>
>