[jira] [Commented] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

Pat Ferrel (JIRA) Sun, 21 Dec 2014 10:22:22 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255237#comment-14255237
 ]


Pat Ferrel commented on MAHOUT-1493:
------------------------------------

I did some simplification of the drivers which made your driver.start override 
unnecessary. So i commented them out. They do no harm but would if the default 
config ever changed.

Pardon my ignorance of the naive bayes driver but do we really want to keep 
using sequence files? Even for spark-rowsimilarity we use delimited text to 
encode a DRM, adding the ability to have user specific IDs. This has the 
benefit of hiding the Mahout IDs from a CLI user, which seems to be the source 
of a great number of mistakes and mailing list questions. 

The internal item and row similarity code is structured around rdd backed DRMs 
so the DSL is honored. But for the CLI text makes input human readable and 
language neutral.

Since we don't save around intermediate files anymore do we have to live with 
the binary format going forward?

> Port Naive Bayes to the Spark DSL
> ---------------------------------
>
>                 Key: MAHOUT-1493
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1493
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>            Reporter: Sebastian Schelter
>            Assignee: Andrew Palumbo
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1493.patch, MAHOUT-1493.patch, MAHOUT-1493.patch, 
> MAHOUT-1493.patch, MAHOUT-1493a.patch
>
>
> Port our Naive Bayes implementation to the new spark dsl. Shouldn't require 
> more than a few lines of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

Reply via email to