[ https://issues.apache.org/jira/browse/SPARK-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413038#comment-15413038 ]
Hyukjin Kwon commented on SPARK-16896: -------------------------------------- I don't mind if you go ahead (I was looking at this problem though). One thing I want to say is, we might better match the behaviour to [read.csv|https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html] in R if possible in this case. In addition, we are handling {{nullValue}} in handling the header with making numbers already. I guess we should clarify and write the behaviour in the PR description including the cases in R. Also, do not forget to follow https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark for making a contribution. > Loading csv with duplicate column names > --------------------------------------- > > Key: SPARK-16896 > URL: https://issues.apache.org/jira/browse/SPARK-16896 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Aseem Bansal > > It would be great if the library allows us to load csv with duplicate column > names. I understand that having duplicate columns in the data is odd but > sometimes we get data that has duplicate columns. Getting upstream data like > that can happen. We may choose to ignore them but currently there is no way > to drop those as we are not able to load them at all. Currently as a > pre-processing I loaded the data into R, changed the column names and then > make a fixed version with which Spark Java API can work. > But if talk about other options, e.g. R has read.csv which automatically > takes care of such situation by appending a number to the column name. > Also case sensitivity in column names can also cause problems. I mean if we > have columns like > ColumnName, columnName > I may want to have them as separate. But the option to do this is not > documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org