Barry Becker created SPARK-17086:
------------------------------------

             Summary: QuantileDiscretizer throws InvalidArgumentException 
(parameter splits given invalid value) on valid data
                 Key: SPARK-17086
                 URL: https://issues.apache.org/jira/browse/SPARK-17086
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.1.0
            Reporter: Barry Becker


I discovered this bug when working with a build from the master branch (which I 
believe is 2.1.0). This used to work fine when running spark 1.6.2.

I have a dataframe with an "intData" column that has values like 
{code}
1 3 2 1 1 2 3 2 2 2 1 3
{code}
I have a stage in my pipeline that uses the QuantileDiscretizer to produce 
equal weight splits like this
{code}
new QuantileDiscretizer()
        .setInputCol("intData")
        .setOutputCol("intData_bin")
        .setNumBuckets(10)
        .fit(df)
{code}
But when that gets run it (incorrectly) throws this error:
{code}
parameter splits given invalid value [-Infinity, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 
Infinity]
{code}
I don't think that there should be duplicate splits generated should there be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to