[ https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng updated SPARK-38588: --------------------------------- Summary: Validate input dataset of ml.classification (was: Validate input dataset of LinearSVC) > Validate input dataset of ml.classification > ------------------------------------------- > > Key: SPARK-38588 > URL: https://issues.apache.org/jira/browse/SPARK-38588 > Project: Spark > Issue Type: Sub-task > Components: ML > Affects Versions: 3.4.0 > Reporter: zhengruifeng > Priority: Major > > LinearSVC should fail fast if the input dataset contains invalid values. > > {code:java} > import org.apache.spark.ml.feature._ > import org.apache.spark.ml.linalg._ > import org.apache.spark.ml.classification._ > import org.apache.spark.ml.clustering._ > val df = sc.parallelize(Seq(LabeledPoint(1.0, Vectors.dense(1.0, > Double.NaN)), LabeledPoint(0.0, Vectors.dense(Double.PositiveInfinity, > 2.0)))).toDF() > val svc = new LinearSVC() > val model = svc.fit(df) > scala> model.intercept > res0: Double = NaN > scala> model.coefficients > res1: org.apache.spark.ml.linalg.Vector = [NaN,NaN] {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org