Hi Robin,

we already figured out the math. It would be great if you could do a short proof-read of the changes the refactoring introduced.

--sebastian

On 04.07.2011 09:06, Robin Anil wrote:


On Wed, Jun 29, 2011 at 3:33 AM, Ted Dunning <[email protected]
<mailto:[email protected]>> wrote:

    Hmmm... not sure.  I thought they were all the same.  It is possible
    there
    is a left-over implementation.

    Robin?  Care to comment?

Didnt see the thread. Both are based on same math. naivebayes one uses
vectors instead of text


    On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <[email protected]
    <mailto:[email protected]>> wrote:

     > Is org.apache.mahout.classifier.**naivebayes also based on that
    one? I
     > thought it was only relevant for
    org.apache.mahout.classifier.**bayes?
     >
     >
     > On 28.06.2011 23:58, Ted Dunning wrote:
     >
     >> See here:
     >> http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**
     >>
    
8572&rank=1<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1
    <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>>
     >>
     >> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA)
     >> <[email protected] <mailto:[email protected]>>wrote:
     >>
     >>
     >>>    [
     >>> https://issues.apache.org/**jira/browse/MAHOUT-746?page=**
     >>> com.atlassian.jira.plugin.**system.issuetabpanels:comment-**
     >>>
    
tabpanel&focusedCommentId=**13056805#comment-13056805<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805
    
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>>
     >>> ]
     >>>
     >>> Sebastian Schelter commented on MAHOUT-746:
     >>> ------------------------------**-------------
     >>>
     >>> Thank you very much, Sean.
     >>>
     >>> I wonder whether there is some article/paper that describes this
     >>> particular
     >>> approach of implementing Naive Bayes? A colleague of mine with
    a much
     >>> deeper
     >>> statistics background and me took a look at the details of the
     >>> computation
     >>> today and we were left with some open questions.
     >>>
     >>>  Refactoring of the parallel Naive Bayes implementation in
     >>>>
     >>> org.apache.mahout.classifier.**naivebayes
     >>>
     >>>>
     >>>>  ------------------------------**------------------------------**
     >>> ------------------------------**-------
     >>>
     >>>>
     >>>>                 Key: MAHOUT-746
     >>>>                 URL:
    
https://issues.apache.org/**jira/browse/MAHOUT-746<https://issues.apache.org/jira/browse/MAHOUT-746>
     >>>>             Project: Mahout
     >>>>          Issue Type: Improvement
     >>>>          Components: Classification
     >>>>    Affects Versions: 0.6
     >>>>            Reporter: Sebastian Schelter
     >>>>            Assignee: Sebastian Schelter
     >>>>             Fix For: 0.6
     >>>>
     >>>>         Attachments: MAHOUT-746.patch
     >>>>
     >>>>
     >>>> I refactored the code in
    org.apache.mahout.classifier.**naivebayes to
     >>>>
     >>> extend AbstractJob, decoupled the model serialization from the job
     >>> output,
     >>> extracted trainer classes and tried to clarify naming and
    reduce code
     >>> complexity. I also added tests for the training M/R code as
    well as a toy
     >>> integration test.
     >>>
     >>>> It would be great if someone could review my patch to make
    sure I didn't
     >>>>
     >>> break anything.
     >>>
     >>> --
     >>> This message is automatically generated by JIRA.
     >>> For more information on JIRA, see: http://www.atlassian.com/**
     >>> software/jira <http://www.atlassian.com/software/jira>
     >>>
     >>>
     >>>
     >>>
     >>
     >



Reply via email to