[ https://issues.apache.org/jira/browse/SPARK-22700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-22700: -------------------------------------- Fix Version/s: 2.2.2 > Bucketizer.transform incorrectly drops row containing NaN > --------------------------------------------------------- > > Key: SPARK-22700 > URL: https://issues.apache.org/jira/browse/SPARK-22700 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.2.0, 2.3.0 > Reporter: zhengruifeng > Assignee: zhengruifeng > Priority: Major > Fix For: 2.2.2, 2.3.0 > > > {code} > import org.apache.spark.ml.feature._ > val df = spark.createDataFrame(Seq((2.3, 3.0), (Double.NaN, 3.0), (6.7, > Double.NaN))).toDF("a", "b") > val splits = Array(Double.NegativeInfinity, 3.0, Double.PositiveInfinity) > val bucketizer: Bucketizer = new > Bucketizer().setInputCol("a").setOutputCol("aa").setSplits(splits) > bucketizer.setHandleInvalid("skip") > scala> df.show > +---+---+ > | a| b| > +---+---+ > |2.3|3.0| > |NaN|3.0| > |6.7|NaN| > +---+---+ > scala> bucketizer.transform(df).show > +---+---+---+ > | a| b| aa| > +---+---+---+ > |2.3|3.0|0.0| > +---+---+---+ > {code} > When {{handleInvalid}} is set {{skip}}, the last item in input is incorrectly > droped, though colum 'b' is not an input column -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org