[ https://issues.apache.org/jira/browse/SPARK-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shivaram Venkataraman resolved SPARK-12104. ------------------------------------------- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved by pull request 10118 [https://github.com/apache/spark/pull/10118] > collect() does not handle multiple columns with same name > --------------------------------------------------------- > > Key: SPARK-12104 > URL: https://issues.apache.org/jira/browse/SPARK-12104 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 1.6.0 > Reporter: Hossein Falaki > Priority: Critical > Fix For: 2.0.0, 1.6.1 > > > This is a regression from Spark 1.5 > Spark can produce DataFrames with identical names (e.g., after left outer > joins). In 1.5 when such a DataFrame was collected we ended up with an R > data.frame with modified column names: > {code} > > names(mySparkDF) > [1] "date" "name" "name" > > names(collect(mySparkDF)) > [1] "date" "name" "name.1" > {code} > But in 1.6 only the first column is included in the collected R data.frame. I > think SparkR should continue the old behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org