[jira] [Updated] (SPARK-16064) Fix the GLM error caused by NA produced by reweight function

Zhang Mengqi (JIRA) Mon, 20 Jun 2016 04:40:48 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhang Mengqi updated SPARK-16064:
---------------------------------
    Description: 
This case happens when users run GLM in with SparkR, the same dataset runs GLM 
well in native R.
When users run the GLM model using glm with family of poisson, it generates a 
assertion errors by NA produced by reweight function.

16/06/20 16:40:22 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
        at scala.Predef$.assert(Predef.scala:170)
        at 
org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
        at 
org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
        at 
org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
        at 
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
        at 
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
        at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
        at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
        at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
        at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.Abstra

P.S The dataset is about a city ride flow between several planning area in 
Singapore.

ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family = 
poisson(link = "log"))

SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, Dj:int, 
distance:double]

  was:

When users run the GLM model using glm with family of poisson, it generates a 
assertion errors by NA produced by reweight function.

16/06/20 16:40:22 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
        at scala.Predef$.assert(Predef.scala:170)
        at 
org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
        at 
org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
        at 
org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
        at 
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
        at 
org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
        at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
        at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
        at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
        at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.Abstra






> Fix the GLM error caused by NA produced by reweight function
> ------------------------------------------------------------
>
>                 Key: SPARK-16064
>                 URL: https://issues.apache.org/jira/browse/SPARK-16064
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Zhang Mengqi
>            Priority: Minor
>
> This case happens when users run GLM in with SparkR, the same dataset runs 
> GLM well in native R.
> When users run the GLM model using glm with family of poisson, it generates a 
> assertion errors by NA produced by reweight function.
> 16/06/20 16:40:22 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
>       at scala.Predef$.assert(Predef.scala:170)
>       at 
> org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
>       at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
>       at 
> org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
>       at 
> org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
>       at 
> org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
>       at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
>       at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>       at scala.collection.Abstra
> P.S The dataset is about a city ride flow between several planning area in 
> Singapore.
> ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family = 
> poisson(link = "log"))
> SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, 
> Dj:int, distance:double]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-16064) Fix the GLM error caused by NA produced by reweight function

Reply via email to