[
https://issues.apache.org/jira/browse/SPARK-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213488#comment-15213488
]
Sean Owen commented on SPARK-14183:
-----------------------------------
It doesn't really make sense in this case to build a model on 1 element, but
the error should be better in any event. There's a bunch of error checking in
LogisticRegression.scala:294 and it could see if the summarizers saw at least
one data point. The problem here is that there are none at all observed by the
summarizer.
> UnsupportedOperationException: empty.max when fitting CrossValidator model
> ---------------------------------------------------------------------------
>
> Key: SPARK-14183
> URL: https://issues.apache.org/jira/browse/SPARK-14183
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.0.0
> Reporter: Jacek Laskowski
> Priority: Minor
>
> The following code produces {{java.lang.UnsupportedOperationException:
> empty.max}}, but it should've said what might've caused that or how to fix it.
> The exception:
> {code}
> scala> val model = cv.fit(df)
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:227)
> at scala.collection.AbstractTraversable.max(Traversable.scala:104)
> at
> org.apache.spark.ml.classification.MultiClassSummarizer.numClasses(LogisticRegression.scala:739)
> at
> org.apache.spark.ml.classification.MultiClassSummarizer.histogram(LogisticRegression.scala:743)
> at
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:288)
> at
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:261)
> at
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:160)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
> at org.apache.spark.ml.Estimator.fit(Estimator.scala:59)
> at org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:78)
> at org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:78)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
> at org.apache.spark.ml.Estimator.fit(Estimator.scala:78)
> at
> org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:110)
> at
> org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:105)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:105)
> ... 55 elided
> {code}
> The code:
> {code}
> import org.apache.spark.ml.tuning._
> val cv = new CrossValidator
> import org.apache.spark.mllib.linalg._
> val features = Vectors.sparse(3, Array(1), Array(1d))
> val df = Seq((0, "hello world", 0d, features)).toDF("id", "text", "label",
> "features")
> import org.apache.spark.ml.classification._
> val lr = new LogisticRegression()
> import org.apache.spark.ml.evaluation.RegressionEvaluator
> val regEval = new RegressionEvaluator()
> val paramGrid = new ParamGridBuilder().build()
> cv.setEstimatorParamMaps(paramGrid).setEstimator(lr).setEvaluator(regEval)
> val model = cv.fit(df)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]