[ https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277933#comment-14277933 ]
Joseph K. Bradley commented on SPARK-4894: ------------------------------------------ +1 for small changes, but occasionally larger ones are needed. The jump to the spark.ml package might be a good time to make such a switch. We don't want to try to approach the complexity of Factorie, but it should not be too hard to support continuous variables. Gaussian NB would surely be the first generalization to add. I agree we should think about the API carefully to encourage people to use NB correctly. The items to decide are: * Should we only support discrete variables? If so, then there is nothing to discuss. * If we support continuous variables (and I would argue we should), then we must design a good API. No matter what API we design, people may misuse it by treating a categorical feature as continuous. Documentation is necessary and helpful, but a good API is important too. That API could take several forms: ** 1 NaiveBayes class, with a Factor type parameter ** multiple NaiveBayes classes for different variable types and/or distributions ** a better way of specifying types, such as enumerations for discrete variables Basically, to support continuous features, we will need to think about types, and I agree it should be thought out so it does not become too complex. > Add Bernoulli-variant of Naive Bayes > ------------------------------------ > > Key: SPARK-4894 > URL: https://issues.apache.org/jira/browse/SPARK-4894 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.2.0 > Reporter: RJ Nowling > Assignee: RJ Nowling > > MLlib only supports the multinomial-variant of Naive Bayes. The Bernoulli > version of Naive Bayes is more useful for situations where the features are > binary values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org