[
https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yanbo Liang updated SPARK-15153:
--------------------------------
Description:
When the type of label of dataset is numeric, SparkR spark.naiveBayes will
throw error when training. This bug is easy to reproduce:
{code}
t <- as.data.frame(Titanic)
t1 <- t[t$Freq > 0, -5]
t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
t2 <- t1[-4]
df <- suppressWarnings(createDataFrame(sqlContext, t2))
m <- spark.naiveBayes(df, NumericSurvived ~ .)
16/05/05 03:26:17 ERROR RBackendHandler: fit on
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassCastException:
org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to
org.apache.spark.ml.attribute.NominalAttribute
at
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
at
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
at
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invo
{code}
In RFormula, the response variable type can be numeric or string. If it's
string, RFormula will transform it to DoubleType by StringIndexer; otherwise,
RFormula will directly use it at model training (and assume it was numbered
from 0, ..., maxLabelIndex). When we extract labels at SparkR naiveBayes
wrapper, we should handle it according the type of the response variable
(string or numeric).
was:
When the type of label of dataset is numeric, SparkR spark.naiveBayes will
throw error when training. This bug is easy to reproduce:
{code}
t <- as.data.frame(Titanic)
t1 <- t[t$Freq > 0, -5]
t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
t2 <- t1[-4]
df <- suppressWarnings(createDataFrame(sqlContext, t2))
m <- spark.naiveBayes(df, NumericSurvived ~ .)
16/05/05 03:26:17 ERROR RBackendHandler: fit on
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassCastException:
org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to
org.apache.spark.ml.attribute.NominalAttribute
at
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
at
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
at
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invo
{code}
In RFormula, the response variable type can be numeric or string. If it's
string, RFormula will transform it to DoubleType by StringIndexer; otherwise,
RFormula will assume it number from 0, ..., maxLabelIndex. We should use
different methods to extract labels from the label column metadata.
> SparkR spark.naiveBayes error when label is numeric type
> --------------------------------------------------------
>
> Key: SPARK-15153
> URL: https://issues.apache.org/jira/browse/SPARK-15153
> Project: Spark
> Issue Type: Bug
> Components: ML, SparkR
> Reporter: Yanbo Liang
>
> When the type of label of dataset is numeric, SparkR spark.naiveBayes will
> throw error when training. This bug is easy to reproduce:
> {code}
> t <- as.data.frame(Titanic)
> t1 <- t[t$Freq > 0, -5]
> t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
> t2 <- t1[-4]
> df <- suppressWarnings(createDataFrame(sqlContext, t2))
> m <- spark.naiveBayes(df, NumericSurvived ~ .)
> 16/05/05 03:26:17 ERROR RBackendHandler: fit on
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
> java.lang.ClassCastException:
> org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to
> org.apache.spark.ml.attribute.NominalAttribute
> at
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
> at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
> at
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
> at
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at io.netty.channel.AbstractChannelHandlerContext.invo
> {code}
> In RFormula, the response variable type can be numeric or string. If it's
> string, RFormula will transform it to DoubleType by StringIndexer; otherwise,
> RFormula will directly use it at model training (and assume it was numbered
> from 0, ..., maxLabelIndex). When we extract labels at SparkR naiveBayes
> wrapper, we should handle it according the type of the response variable
> (string or numeric).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]