Bago Amirbekian created SPARK-23377:

             Summary: Bucketizer with multiple columns persistence bug
                 Key: SPARK-23377
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.3.0
            Reporter: Bago Amirbekian

A Bucketizer with multiple input/output columns get "inputCol" set to the 
default value on write -> read which causes it to throw an error on transform. 
Here's an example.


val splits = Array(Double.NegativeInfinity, 0, 10, 100, Double.PositiveInfinity)
val bucketizer = new Bucketizer()
  .setSplitsArray(Array(splits, splits))
  .setInputCols(Array("foo1", "foo2"))
  .setOutputCols(Array("bar1", "bar2"))

val data = Seq((1.0, 2.0), (10.0, 100.0), (101.0, -1.0)).toDF("foo1", "foo2")

val path = "/temp/bucketrizer-persist-test"
val bucketizerAfterRead =
// This line throws an error because "outputCol" is set

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to