Looks Good. Need to change the examples to use naivebayes.* instead of bayes.*. After that bayes.* can be deprecated and phased out
On Mon, Jul 4, 2011 at 1:48 PM, Sebastian Schelter <[email protected]> wrote: > Hi Robin, > > we already figured out the math. It would be great if you could do a short > proof-read of the changes the refactoring introduced. > > --sebastian > > > On 04.07.2011 09:06, Robin Anil wrote: > >> >> >> On Wed, Jun 29, 2011 at 3:33 AM, Ted Dunning <[email protected] >> <mailto:[email protected]>**> wrote: >> >> Hmmm... not sure. I thought they were all the same. It is possible >> there >> is a left-over implementation. >> >> Robin? Care to comment? >> >> Didnt see the thread. Both are based on same math. naivebayes one uses >> vectors instead of text >> >> >> On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <[email protected] >> <mailto:[email protected]>> wrote: >> >> > Is org.apache.mahout.classifier.****naivebayes also based on that >> one? I >> > thought it was only relevant for >> org.apache.mahout.classifier.****bayes? >> > >> > >> > On 28.06.2011 23:58, Ted Dunning wrote: >> > >> >> See here: >> >> http://citeseerx.ist.psu.edu/****viewdoc/summary?doi=10.1.1.** >> 13.** <http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**> >> >> >> 8572&rank=1<http://citeseerx.**ist.psu.edu/viewdoc/summary?** >> doi=10.1.1.13.8572&rank=1<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1> >> <http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.** >> 8572&rank=1<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1> >> >> >> >> >> >> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA) >> >> <[email protected] <mailto:[email protected]>>**wrote: >> >> >> >> >> >> >>> [ >> >>> >> https://issues.apache.org/****jira/browse/MAHOUT-746?page=**<https://issues.apache.org/**jira/browse/MAHOUT-746?page=**> >> >>> com.atlassian.jira.plugin.****system.issuetabpanels:comment-**** >> >>> >> tabpanel&focusedCommentId=****13056805#comment-13056805<http** >> s://issues.apache.org/jira/**browse/MAHOUT-746?page=com.** >> atlassian.jira.plugin.system.**issuetabpanels:comment-** >> tabpanel&focusedCommentId=**13056805#comment-13056805<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805> >> <https://issues.apache.org/**jira/browse/MAHOUT-746?page=** >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-** >> tabpanel&focusedCommentId=**13056805#comment-13056805<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805> >> >> >> >>> ] >> >>> >> >>> Sebastian Schelter commented on MAHOUT-746: >> >>> ------------------------------****------------- >> >>> >> >>> Thank you very much, Sean. >> >>> >> >>> I wonder whether there is some article/paper that describes this >> >>> particular >> >>> approach of implementing Naive Bayes? A colleague of mine with >> a much >> >>> deeper >> >>> statistics background and me took a look at the details of the >> >>> computation >> >>> today and we were left with some open questions. >> >>> >> >>> Refactoring of the parallel Naive Bayes implementation in >> >>>> >> >>> org.apache.mahout.classifier.****naivebayes >> >>> >> >>>> >> >>>> ------------------------------****----------------------------* >> *--** >> >>> ------------------------------****------- >> >>> >> >>>> >> >>>> Key: MAHOUT-746 >> >>>> URL: >> >> https://issues.apache.org/****jira/browse/MAHOUT-746<https://issues.apache.org/**jira/browse/MAHOUT-746> >> <https:/**/issues.apache.org/jira/**browse/MAHOUT-746<https://issues.apache.org/jira/browse/MAHOUT-746> >> > >> >>>> Project: Mahout >> >>>> Issue Type: Improvement >> >>>> Components: Classification >> >>>> Affects Versions: 0.6 >> >>>> Reporter: Sebastian Schelter >> >>>> Assignee: Sebastian Schelter >> >>>> Fix For: 0.6 >> >>>> >> >>>> Attachments: MAHOUT-746.patch >> >>>> >> >>>> >> >>>> I refactored the code in >> org.apache.mahout.classifier.****naivebayes to >> >>>> >> >>> extend AbstractJob, decoupled the model serialization from the job >> >>> output, >> >>> extracted trainer classes and tried to clarify naming and >> reduce code >> >>> complexity. I also added tests for the training M/R code as >> well as a toy >> >>> integration test. >> >>> >> >>>> It would be great if someone could review my patch to make >> sure I didn't >> >>>> >> >>> break anything. >> >>> >> >>> -- >> >>> This message is automatically generated by JIRA. >> >>> For more information on JIRA, see: http://www.atlassian.com/** >> >>> software/jira >> <http://www.atlassian.com/**software/jira<http://www.atlassian.com/software/jira> >> > >> >>> >> >>> >> >>> >> >>> >> >> >> > >> >> >> >
