[ 
https://issues.apache.org/jira/browse/SPARK-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279269#comment-14279269
 ] 

Joseph K. Bradley commented on SPARK-5272:
------------------------------------------

I like the idea of supporting multiple feature types; I think it should be 
doable, though we'll have to figure out a simple way to specify which features 
are what type.  Decision trees support 2 types: categorical (which includes 
binary and unordered discrete values) and continuous (which includes ordered 
discrete values).  In DecisionTree, you specify categoricalFeaturesInfo which 
says which features are categorical + their arity, but I hope this can become 
part of the SchemaRDD metadata before long.

I think we can take ideas from the DecisionTree API, just not much from the 
underlying implementation.

> Refactor NaiveBayes to support discrete and continuous labels,features
> ----------------------------------------------------------------------
>
>                 Key: SPARK-5272
>                 URL: https://issues.apache.org/jira/browse/SPARK-5272
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.2.0
>            Reporter: Joseph K. Bradley
>
> This JIRA is to discuss refactoring NaiveBayes in order to support both 
> discrete and continuous labels and features.
> Currently, NaiveBayes supports only discrete labels and features.
> Proposal: Generalize it to support continuous values as well.
> Some items to discuss are:
> * How commonly are continuous labels/features used in practice?  (Is this 
> necessary?)
> * What should the API look like?
> ** E.g., should NB have multiple classes for each type of label/feature, or 
> should it take a general Factor type parameter?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to