[
https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255237#comment-14255237
]
Pat Ferrel commented on MAHOUT-1493:
------------------------------------
I did some simplification of the drivers which made your driver.start override
unnecessary. So i commented them out. They do no harm but would if the default
config ever changed.
Pardon my ignorance of the naive bayes driver but do we really want to keep
using sequence files? Even for spark-rowsimilarity we use delimited text to
encode a DRM, adding the ability to have user specific IDs. This has the
benefit of hiding the Mahout IDs from a CLI user, which seems to be the source
of a great number of mistakes and mailing list questions.
The internal item and row similarity code is structured around rdd backed DRMs
so the DSL is honored. But for the CLI text makes input human readable and
language neutral.
Since we don't save around intermediate files anymore do we have to live with
the binary format going forward?
> Port Naive Bayes to the Spark DSL
> ---------------------------------
>
> Key: MAHOUT-1493
> URL: https://issues.apache.org/jira/browse/MAHOUT-1493
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Sebastian Schelter
> Assignee: Andrew Palumbo
> Fix For: 1.0
>
> Attachments: MAHOUT-1493.patch, MAHOUT-1493.patch, MAHOUT-1493.patch,
> MAHOUT-1493.patch, MAHOUT-1493a.patch
>
>
> Port our Naive Bayes implementation to the new spark dsl. Shouldn't require
> more than a few lines of code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)