[
https://issues.apache.org/jira/browse/SPARK-20574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yanbo Liang closed SPARK-20574.
-------------------------------
Resolution: Fixed
Fix Version/s: 2.2.0
> Allow Bucketizer to handle non-Double column
> --------------------------------------------
>
> Key: SPARK-20574
> URL: https://issues.apache.org/jira/browse/SPARK-20574
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.1.0
> Reporter: Wayne Zhang
> Assignee: Wayne Zhang
> Fix For: 2.2.0
>
>
> Bucketizer currently requires input column to be Double, but the logic should
> work on any numeric data types. Many practical problems have integer/float
> data types, and it could get very tedious to manually cast them into Double
> before calling bucketizer. This transformer could be extended to handle all
> numeric types.
> The example below shows failure of Bucketizer on integer data.
> {code}
> val splits = Array(-3.0, 0.0, 3.0)
> val data: Array[Int] = Array(-2, -1, 0, 1, 2)
> val expectedBuckets = Array(0.0, 0.0, 1.0, 1.0, 1.0)
> val dataFrame = data.zip(expectedBuckets).toSeq.toDF("feature", "expected")
> val bucketizer = new Bucketizer()
> .setInputCol("feature")
> .setOutputCol("result")
> .setSplits(splits)
> bucketizer.transform(dataFrame)
> java.lang.IllegalArgumentException: requirement failed: Column feature must
> be of type DoubleType but was actually IntegerType.
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]