[
https://issues.apache.org/jira/browse/SPARK-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279269#comment-14279269
]
Joseph K. Bradley commented on SPARK-5272:
------------------------------------------
I like the idea of supporting multiple feature types; I think it should be
doable, though we'll have to figure out a simple way to specify which features
are what type. Decision trees support 2 types: categorical (which includes
binary and unordered discrete values) and continuous (which includes ordered
discrete values). In DecisionTree, you specify categoricalFeaturesInfo which
says which features are categorical + their arity, but I hope this can become
part of the SchemaRDD metadata before long.
I think we can take ideas from the DecisionTree API, just not much from the
underlying implementation.
> Refactor NaiveBayes to support discrete and continuous labels,features
> ----------------------------------------------------------------------
>
> Key: SPARK-5272
> URL: https://issues.apache.org/jira/browse/SPARK-5272
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.2.0
> Reporter: Joseph K. Bradley
>
> This JIRA is to discuss refactoring NaiveBayes in order to support both
> discrete and continuous labels and features.
> Currently, NaiveBayes supports only discrete labels and features.
> Proposal: Generalize it to support continuous values as well.
> Some items to discuss are:
> * How commonly are continuous labels/features used in practice? (Is this
> necessary?)
> * What should the API look like?
> ** E.g., should NB have multiple classes for each type of label/feature, or
> should it take a general Factor type parameter?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]