Sun Rui created SPARK-11976:
-------------------------------
Summary: Support "." character in DataFrame column name
Key: SPARK-11976
URL: https://issues.apache.org/jira/browse/SPARK-11976
Project: Spark
Issue Type: Improvement
Components: SparkR
Affects Versions: 1.5.2
Reporter: Sun Rui
Now Spark Core support "." character in DataFrame column names. However, when
accessing a column whose name has "." character, the name should be wrapped
with backticks.
for example,
{code}
> df<-createDataFrame(sqlContext, list(list(1,2,3)))
> names(df)<-c("a.b","c","d.e")
> df$"`a.b`"
Column a.b
> df$"a.b"
15/11/25 10:55:06 ERROR RBackendHandler: col on 68 failed
Error in column(callJMethod(x@sdf, "col", c)) :
error in evaluating the argument 'x' in selecting a method for function
'column': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
org.apache.spark.sql.AnalysisException: Cannot resolve column name "a.b"
among (a.b, c, d.e);
at
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
at
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:151)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:150)
at org.apache.spark.sql.DataFrame.col(DataFrame.scala:663)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
at org.apache.spark.api.r.RBackendHa
{code}
This means, the safe way to select a column using its name is to wrap it with
backticks in the case the column name is programatically fetched, not known in
advance.
When this is support, the below code piece can be removed from
createDataFrame():
{code}
# SPAKR-SQL does not support '.' in column name, so replace it with '_'
# TODO(davies): remove this once SPARK-2775 is fixed
names <- lapply(names, function(n) {
nn <- gsub("[.]", "_", n)
if (nn != n) {
warning(paste("Use", nn, "instead of", n, " as column name"))
}
nn
})
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]