Thank you for the fast response. I think that bayes.* is doing a little
bit more still as it is also implementing the text-specific parts of the
paper, doesn't that need to be ported too?
--sebastian
On 04.07.2011 10:27, Robin Anil wrote:
Looks Good. Need to change the examples to use naivebayes.* instead of
bayes.*. After that bayes.* can be deprecated and phased out
On Mon, Jul 4, 2011 at 1:48 PM, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:
Hi Robin,
we already figured out the math. It would be great if you could do a
short proof-read of the changes the refactoring introduced.
--sebastian
On 04.07.2011 09:06, Robin Anil wrote:
On Wed, Jun 29, 2011 at 3:33 AM, Ted Dunning
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>__>
wrote:
Hmmm... not sure. I thought they were all the same. It is
possible
there
is a left-over implementation.
Robin? Care to comment?
Didnt see the thread. Both are based on same math. naivebayes
one uses
vectors instead of text
On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
> Is org.apache.mahout.classifier.*__*naivebayes also based on that
one? I
> thought it was only relevant for
org.apache.mahout.classifier.*__*bayes?
>
>
> On 28.06.2011 23:58, Ted Dunning wrote:
>
>> See here:
>>
http://citeseerx.ist.psu.edu/*__*viewdoc/summary?doi=10.1.1.__13.**
<http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**>
>>
8572&rank=1<http://citeseerx.__ist.psu.edu/viewdoc/summary?__doi=10.1.1.13.8572&rank=1
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>
<http://citeseerx.ist.psu.edu/__viewdoc/summary?doi=10.1.1.13.__8572&rank=1
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>>>
>>
>> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA)
>> <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>__wrote:
>>
>>
>>> [
>>>
https://issues.apache.org/**__jira/browse/MAHOUT-746?page=**
<https://issues.apache.org/**jira/browse/MAHOUT-746?page=**>
>>>
com.atlassian.jira.plugin.**__system.issuetabpanels:comment-__**
>>>
tabpanel&focusedCommentId=**__13056805#comment-13056805<http__s://issues.apache.org/jira/__browse/MAHOUT-746?page=com.__atlassian.jira.plugin.system.__issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>
<https://issues.apache.org/__jira/browse/MAHOUT-746?page=__com.atlassian.jira.plugin.__system.issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805
<https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>>>
>>> ]
>>>
>>> Sebastian Schelter commented on MAHOUT-746:
>>> ------------------------------__**-------------
>>>
>>> Thank you very much, Sean.
>>>
>>> I wonder whether there is some article/paper that describes
this
>>> particular
>>> approach of implementing Naive Bayes? A colleague of mine with
a much
>>> deeper
>>> statistics background and me took a look at the details of the
>>> computation
>>> today and we were left with some open questions.
>>>
>>> Refactoring of the parallel Naive Bayes implementation in
>>>>
>>> org.apache.mahout.classifier.*__*naivebayes
>>>
>>>>
>>>>
------------------------------__**----------------------------__--**
>>> ------------------------------__**-------
>>>
>>>>
>>>> Key: MAHOUT-746
>>>> URL:
https://issues.apache.org/**__jira/browse/MAHOUT-746
<https://issues.apache.org/**jira/browse/MAHOUT-746><https:/__/issues.apache.org/jira/__browse/MAHOUT-746
<https://issues.apache.org/jira/browse/MAHOUT-746>>
>>>> Project: Mahout
>>>> Issue Type: Improvement
>>>> Components: Classification
>>>> Affects Versions: 0.6
>>>> Reporter: Sebastian Schelter
>>>> Assignee: Sebastian Schelter
>>>> Fix For: 0.6
>>>>
>>>> Attachments: MAHOUT-746.patch
>>>>
>>>>
>>>> I refactored the code in
org.apache.mahout.classifier.*__*naivebayes to
>>>>
>>> extend AbstractJob, decoupled the model serialization from
the job
>>> output,
>>> extracted trainer classes and tried to clarify naming and
reduce code
>>> complexity. I also added tests for the training M/R code as
well as a toy
>>> integration test.
>>>
>>>> It would be great if someone could review my patch to make
sure I didn't
>>>>
>>> break anything.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> For more information on JIRA, see: http://www.atlassian.com/**
>>> software/jira <http://www.atlassian.com/__software/jira
<http://www.atlassian.com/software/jira>>
>>>
>>>
>>>
>>>
>>
>