[jira] [Comment Edited] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

Felix Cheung (JIRA) Sat, 27 Aug 2016 23:15:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442870#comment-15442870
 ]


Felix Cheung edited comment on SPARK-17214 at 8/28/16 6:15 AM:
---------------------------------------------------------------

I think the underlining issue is that we should either handle column names with 
`.` correctly (preferred) or translate them uniformly as in other cases (eg. 
`as.DataFrame`)

As of now a DataFrame from csv source can have `.` in column names and it is 
unoperable until renamed (which is a known issue):
{code}
> iris_sdf<-read.df("iris.csv","csv",header="true",inferSchema="true")
> iris_sdf
SparkDataFrame[Sepal.Length:double, Sepal.Width:double, Petal.Length:double, 
Petal.Width:double, Species:string]
> head(select(iris_sdf,iris_sdf$Sepal.Length))
16/08/28 06:11:16 ERROR RBackendHandler: col on 46 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  org.apache.spark.sql.AnalysisException: Cannot resolve column name 
"Sepal.Length" among (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, 
Species);
{code}


was (Author: felixcheung):
I think the underlining issue is that we should either handle column names with 
`.` correctly (preferred) or translate them uniformly as in other cases (eg. 
`as.DataFrame`)

As of now a DataFrame from csv source can have `.` in column names and it is 
unoperable until renamed:
{code}
> iris_sdf<-read.df("iris.csv","csv",header="true",inferSchema="true")
> iris_sdf
SparkDataFrame[Sepal.Length:double, Sepal.Width:double, Petal.Length:double, 
Petal.Width:double, Species:string]
> head(select(iris_sdf,iris_sdf$Sepal.Length))
16/08/28 06:11:16 ERROR RBackendHandler: col on 46 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  org.apache.spark.sql.AnalysisException: Cannot resolve column name 
"Sepal.Length" among (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, 
Species);
{code}

> How to deal with dots (.) present in column names in SparkR
> -----------------------------------------------------------
>
>                 Key: SPARK-17214
>                 URL: https://issues.apache.org/jira/browse/SPARK-17214
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Mohit Bansal
>
> I am trying to load a local csv file into SparkR, which contains dots in 
> column names. After reading the file I tried to change the names and replaced 
> "." with "_". Still I am not able to do any operation on the created SDF. 
> Here is the reproducible code:
> -------------------------------------------------------------------------------
> #writing iris dataset to local
> write.csv(iris,"iris.csv",row.names=F)
> #reading it back using read.df
> iris_sdf<-read.df("iris.csv","csv",header="true",inferSchema="true")
> #changing column names
> names(iris_sdf)<-c("Sepal_Length","Sepal_Width","Petal_Length","Petal_Width","Species")
> #selecting required columna
> head(select(iris_sdf,iris_sdf$Sepal_Length,iris_sdf$Sepal_Width))
> ---------------------------------------------------------------------------------
> 16/08/24 13:51:24 ERROR RBackendHandler: dfToCols on 
> org.apache.spark.sql.api.r.SQLUtils failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.sql.AnalysisException: Unable to resolve Sepal.Length 
> given [Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species];
>     at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>     at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>     at scala.Option.getOrElse(Option.scala:121)
>     at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:133)
>     at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>     at scala.collection.IterableLike$cl
> What should I do to get it work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

Reply via email to