Thank you for the fast response. I think that bayes.* is doing a little bit more still as it is also implementing the text-specific parts of the paper, doesn't that need to be ported too?

--sebastian

On 04.07.2011 10:27, Robin Anil wrote:
Looks Good. Need to change the examples to use naivebayes.* instead of
bayes.*. After that bayes.* can be deprecated and phased out



On Mon, Jul 4, 2011 at 1:48 PM, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:

    Hi Robin,

    we already figured out the math. It would be great if you could do a
    short proof-read of the changes the refactoring introduced.

    --sebastian


    On 04.07.2011 09:06, Robin Anil wrote:



        On Wed, Jun 29, 2011 at 3:33 AM, Ted Dunning
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>__>
        wrote:

            Hmmm... not sure.  I thought they were all the same.  It is
        possible
            there
            is a left-over implementation.

            Robin?  Care to comment?

        Didnt see the thread. Both are based on same math. naivebayes
        one uses
        vectors instead of text


            On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

         > Is org.apache.mahout.classifier.*__*naivebayes also based on that
            one? I
         > thought it was only relevant for
            org.apache.mahout.classifier.*__*bayes?
         >
         >
         > On 28.06.2011 23:58, Ted Dunning wrote:
         >
         >> See here:
         >>
        http://citeseerx.ist.psu.edu/*__*viewdoc/summary?doi=10.1.1.__13.**
        <http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**>
         >>

          
8572&rank=1<http://citeseerx.__ist.psu.edu/viewdoc/summary?__doi=10.1.1.13.8572&rank=1 
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>
        
<http://citeseerx.ist.psu.edu/__viewdoc/summary?doi=10.1.1.13.__8572&rank=1
        
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>>>
         >>
         >> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA)
         >> <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>>__wrote:

         >>
         >>
         >>>    [
         >>>
        https://issues.apache.org/**__jira/browse/MAHOUT-746?page=**
        <https://issues.apache.org/**jira/browse/MAHOUT-746?page=**>
         >>>
        com.atlassian.jira.plugin.**__system.issuetabpanels:comment-__**
         >>>

          
tabpanel&focusedCommentId=**__13056805#comment-13056805<http__s://issues.apache.org/jira/__browse/MAHOUT-746?page=com.__atlassian.jira.plugin.system.__issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805
 
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>
        
<https://issues.apache.org/__jira/browse/MAHOUT-746?page=__com.atlassian.jira.plugin.__system.issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805
        
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>>>
         >>> ]
         >>>
         >>> Sebastian Schelter commented on MAHOUT-746:
         >>> ------------------------------__**-------------
         >>>
         >>> Thank you very much, Sean.
         >>>
         >>> I wonder whether there is some article/paper that describes
        this
         >>> particular
         >>> approach of implementing Naive Bayes? A colleague of mine with
            a much
         >>> deeper
         >>> statistics background and me took a look at the details of the
         >>> computation
         >>> today and we were left with some open questions.
         >>>
         >>>  Refactoring of the parallel Naive Bayes implementation in
         >>>>
         >>> org.apache.mahout.classifier.*__*naivebayes
         >>>
         >>>>
         >>>>
          ------------------------------__**----------------------------__--**
         >>> ------------------------------__**-------
         >>>
         >>>>
         >>>>                 Key: MAHOUT-746
         >>>>                 URL:
        https://issues.apache.org/**__jira/browse/MAHOUT-746
        
<https://issues.apache.org/**jira/browse/MAHOUT-746><https:/__/issues.apache.org/jira/__browse/MAHOUT-746
        <https://issues.apache.org/jira/browse/MAHOUT-746>>
         >>>>             Project: Mahout
         >>>>          Issue Type: Improvement
         >>>>          Components: Classification
         >>>>    Affects Versions: 0.6
         >>>>            Reporter: Sebastian Schelter
         >>>>            Assignee: Sebastian Schelter
         >>>>             Fix For: 0.6
         >>>>
         >>>>         Attachments: MAHOUT-746.patch
         >>>>
         >>>>
         >>>> I refactored the code in
            org.apache.mahout.classifier.*__*naivebayes to
         >>>>
         >>> extend AbstractJob, decoupled the model serialization from
        the job
         >>> output,
         >>> extracted trainer classes and tried to clarify naming and
            reduce code
         >>> complexity. I also added tests for the training M/R code as
            well as a toy
         >>> integration test.
         >>>
         >>>> It would be great if someone could review my patch to make
            sure I didn't
         >>>>
         >>> break anything.
         >>>
         >>> --
         >>> This message is automatically generated by JIRA.
         >>> For more information on JIRA, see: http://www.atlassian.com/**
         >>> software/jira <http://www.atlassian.com/__software/jira
        <http://www.atlassian.com/software/jira>>
         >>>
         >>>
         >>>
         >>>
         >>
         >





Reply via email to