[
https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396051#comment-14396051
]
ASF GitHub Bot commented on MAHOUT-1493:
----------------------------------------
GitHub user andrewpalumbo opened a pull request:
https://github.com/apache/mahout/pull/104
MAHOUT-1493 parallelize SparkNaiveBayes.test(...)
Explicitly define math-scala NaiveBayes.test(...) as sequential and in
memory. Extend test(..) into SparkNaiveBayes and distribute the classification
process. Also some general cleanup.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewpalumbo/mahout MAHOUT-1493-serialize
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/mahout/pull/104.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #104
----
commit 98fc94484a408ecc9e433babaee772258ba9bae9
Author: Andrew Palumbo <[email protected]>
Date: 2015-04-05T01:27:47Z
add a prefix directory for model.dfsWrite(..) to write components. added
tests for full training and testing of seeded random toy TFIDF data
commit 1049b461464ab27a56f742d96c5b5c73b040ec1b
Author: Andrew Palumbo <[email protected]>
Date: 2015-04-05T01:33:23Z
use SparkNaiveBayes rather than NaiveBayes in CLI drivers to avoid confusion
commit 6c5dc8f3359255f85fbbbe2ffb24681fa489255d
Author: Andrew Palumbo <[email protected]>
Date: 2015-04-05T02:40:44Z
override NaiveBayes.test in Spark and broadcast the classifier to the
closure. Now is no longer pulling everything into memory up frot
commit 2522a029ecbf6389f09ab69f80f05a298bcd7ea0
Author: Andrew Palumbo <[email protected]>
Date: 2015-04-05T02:55:08Z
Make math-scala NaiveBayes.test(...) explictly sequential
commit bad6d9a06e3e04cfdafcce7d1adc2066ecd991b7
Author: Andrew Palumbo <[email protected]>
Date: 2015-04-05T03:29:36Z
Cleanup
----
> Port Naive Bayes to the Spark DSL
> ---------------------------------
>
> Key: MAHOUT-1493
> URL: https://issues.apache.org/jira/browse/MAHOUT-1493
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Sebastian Schelter
> Assignee: Andrew Palumbo
> Labels: DSL, h2o, scala
> Fix For: 0.10.0
>
> Attachments: MAHOUT-1493.patch, MAHOUT-1493.patch, MAHOUT-1493.patch,
> MAHOUT-1493.patch, MAHOUT-1493a.patch
>
>
> Port our Naive Bayes implementation to the new spark dsl. Shouldn't require
> more than a few lines of code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)