Sun Rui created SPARK-11976:
-------------------------------

             Summary: Support "." character in DataFrame column name
                 Key: SPARK-11976
                 URL: https://issues.apache.org/jira/browse/SPARK-11976
             Project: Spark
          Issue Type: Improvement
          Components: SparkR
    Affects Versions: 1.5.2
            Reporter: Sun Rui


Now Spark Core support "." character in DataFrame column names. However, when 
accessing a column whose name has "." character, the name should be wrapped 
with backticks.

for example,
{code}
> df<-createDataFrame(sqlContext, list(list(1,2,3)))
> names(df)<-c("a.b","c","d.e")
> df$"`a.b`"
Column a.b 
> df$"a.b"
15/11/25 10:55:06 ERROR RBackendHandler: col on 68 failed
Error in column(callJMethod(x@sdf, "col", c)) : 
  error in evaluating the argument 'x' in selecting a method for function 
'column': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
  org.apache.spark.sql.AnalysisException: Cannot resolve column name "a.b" 
among (a.b, c, d.e);
        at 
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
        at 
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:150)
        at org.apache.spark.sql.DataFrame.col(DataFrame.scala:663)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
        at org.apache.spark.api.r.RBackendHa
{code}

This means, the safe way to select a column using its name is to wrap it with 
backticks in the case the column name is programatically fetched, not known in 
advance.

When this is support, the below code piece can be removed from 
createDataFrame():
{code}
    # SPAKR-SQL does not support '.' in column name, so replace it with '_'
    # TODO(davies): remove this once SPARK-2775 is fixed
    names <- lapply(names, function(n) {
      nn <- gsub("[.]", "_", n)
      if (nn != n) {
        warning(paste("Use", nn, "instead of", n, " as column name"))
      }
      nn
    })
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to