[
https://issues.apache.org/jira/browse/SPARK-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhang Mengqi updated SPARK-16064:
---------------------------------
Description:
This case happens when users run GLM in with SparkR, the same dataset runs GLM
well in native R.
When users run the GLM model using glm with family of poisson, it generates a
assertion errors by NA produced by reweight function.
16/06/20 16:40:22 ERROR RBackendHandler: fit on
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
at scala.Predef$.assert(Predef.scala:170)
at
org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
at
org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
at
org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
at
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
at
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.Abstra
P.S The dataset is about a city ride flow between several planning area in
Singapore.
ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family =
poisson(link = "log"))
SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, Dj:int,
distance:double]
was:
When users run the GLM model using glm with family of poisson, it generates a
assertion errors by NA produced by reweight function.
16/06/20 16:40:22 ERROR RBackendHandler: fit on
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
at scala.Predef$.assert(Predef.scala:170)
at
org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
at
org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
at
org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
at
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
at
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.Abstra
> Fix the GLM error caused by NA produced by reweight function
> ------------------------------------------------------------
>
> Key: SPARK-16064
> URL: https://issues.apache.org/jira/browse/SPARK-16064
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.0.0
> Reporter: Zhang Mengqi
> Priority: Minor
>
> This case happens when users run GLM in with SparkR, the same dataset runs
> GLM well in native R.
> When users run the GLM model using glm with family of poisson, it generates a
> assertion errors by NA produced by reweight function.
> 16/06/20 16:40:22 ERROR RBackendHandler: fit on
> org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
> java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
> at scala.Predef$.assert(Predef.scala:170)
> at
> org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
> at
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
> at
> org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
> at
> org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
> at
> org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
> at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
> at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.Abstra
> P.S The dataset is about a city ride flow between several planning area in
> Singapore.
> ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family =
> poisson(link = "log"))
> SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int,
> Dj:int, distance:double]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]