[
https://issues.apache.org/jira/browse/SPARK-12874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen updated SPARK-12874:
-------------------------------
Fix Version/s: (was: 1.6.2)
1.6.1
> ML StringIndexer does not protect itself from column name duplication
> ---------------------------------------------------------------------
>
> Key: SPARK-12874
> URL: https://issues.apache.org/jira/browse/SPARK-12874
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 1.5.2, 1.6.0
> Reporter: Wojciech Jurczyk
> Assignee: Yu Ishikawa
> Fix For: 1.6.1, 2.0.0
>
>
> StringIndexerModel, when performing transform() does not check the schema of
> the input DataFrame. Because of that, it is possible to create a DataFrame
> containing columns with duplicated names.
> This issue is similar to SPARK-12711. StringIndexer could make use of
> transformSchema to assure that the input DataFrame schema is correct in sense
> of the parameters' values.
> Please confirm. Then, I'll prepare a PR to resolve the bug.
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala#L147
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]