[jira] [Assigned] (SPARK-12874) ML StringIndexer does not protect itself from column name duplication

Apache Spark (JIRA) Thu, 25 Feb 2016 07:18:36 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-12874:
------------------------------------

    Assignee: Apache Spark

> ML StringIndexer does not protect itself from column name duplication
> ---------------------------------------------------------------------
>
>                 Key: SPARK-12874
>                 URL: https://issues.apache.org/jira/browse/SPARK-12874
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Wojciech Jurczyk
>            Assignee: Apache Spark
>
> StringIndexerModel, when performing transform() does not check the schema of 
> the input DataFrame. Because of that, it is possible to create a DataFrame 
> containing columns with duplicated names.
> This issue is similar to SPARK-12711. StringIndexer could make use of 
> transformSchema to assure that the input DataFrame schema is correct in sense 
> of the parameters' values.
> Please confirm. Then, I'll prepare a PR to resolve the bug.
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala#L147



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-12874) ML StringIndexer does not protect itself from column name duplication

Reply via email to