[
https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Schelter updated MAHOUT-1493:
---------------------------------------
Attachment: MAHOUT-1493.patch
Updated the patch according to Dmitriy's style suggestions.
The only thing that is executed in a distributed fashion is the summation of
the observations for each label. The rest of the training is done in memory on
the driver. This should not be a problem as the data used by the local training
has to fit in memory for the existing NaiveBayesModel anyway.
The patch needs to be verified on a real example yet. I will do this in May
probably. If someone wants to test it before that time, I'd be happy to help.
> Port Naive Bayes to the Spark DSL
> ---------------------------------
>
> Key: MAHOUT-1493
> URL: https://issues.apache.org/jira/browse/MAHOUT-1493
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1493.patch, MAHOUT-1493.patch
>
>
> Port our Naive Bayes implementation to the new spark dsl. Shouldn't require
> more than a few lines of code.
--
This message was sent by Atlassian JIRA
(v6.2#6252)