[ 
https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277933#comment-14277933
 ] 

Joseph K. Bradley commented on SPARK-4894:
------------------------------------------

+1 for small changes, but occasionally larger ones are needed.  The jump to the 
spark.ml package might be a good time to make such a switch.

We don't want to try to approach the complexity of Factorie, but it should not 
be too hard to support continuous variables.  Gaussian NB would surely be the 
first generalization to add.

I agree we should think about the API carefully to encourage people to use NB 
correctly.  The items to decide are:
* Should we only support discrete variables?  If so, then there is nothing to 
discuss.
* If we support continuous variables (and I would argue we should), then we 
must design a good API.  No matter what API we design, people may misuse it by 
treating a categorical feature as continuous.  Documentation is necessary and 
helpful, but a good API is important too.  That API could take several forms:
** 1 NaiveBayes class, with a Factor type parameter
** multiple NaiveBayes classes for different variable types and/or distributions
** a better way of specifying types, such as enumerations for discrete variables

Basically, to support continuous features, we will need to think about types, 
and I agree it should be thought out so it does not become too complex.

> Add Bernoulli-variant of Naive Bayes
> ------------------------------------
>
>                 Key: SPARK-4894
>                 URL: https://issues.apache.org/jira/browse/SPARK-4894
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.2.0
>            Reporter: RJ Nowling
>            Assignee: RJ Nowling
>
> MLlib only supports the multinomial-variant of Naive Bayes.  The Bernoulli 
> version of Naive Bayes is more useful for situations where the features are 
> binary values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to