Could you provide more information on how df in your example is created? Also please include the output from printSchema(df)?
This example works: > c <- createDataFrame(cars) > c SparkDataFrame[speed:double, dist:double] > c$speed <- c$dist*0 > c SparkDataFrame[speed:double, dist:double] > head(c) speed dist 1 0 2 2 0 10 3 0 4 4 0 22 5 0 16 6 0 10 _____________________________ From: Bedrytski Aliaksandr <sp...@bedryt.ski<mailto:sp...@bedryt.ski>> Sent: Friday, September 9, 2016 9:13 PM Subject: Re: SparkR error: reference is ambiguous. To: xingye <tracy.up...@gmail.com<mailto:tracy.up...@gmail.com>> Cc: <user@spark.apache.org<mailto:user@spark.apache.org>> Hi, Can you use full-string queries in SparkR? Like (in Scala): df1.registerTempTable("df1") df2.registerTempTable("df2") val df3 = sparkContext.sql("SELECT * FROM df1 JOIN df2 ON df1.ra = df2.ra") explicitly mentioning table names in the query often solves ambiguity problems. Regards -- Bedrytski Aliaksandr sp...@bedryt.ski<mailto:sp...@bedryt.ski> On Fri, Sep 9, 2016, at 19:33, xingye wrote: Not sure whether this is the right distribution list that I can ask questions. If not, can someone give a distribution list that can find someone to help? I kept getting error of reference is ambiguous when implementing some sparkR code. 1. when i tried to assign values to a column using the existing column: df$c_mon<- df$ra*0 1. 16/09/09 15:11:28 ERROR RBackendHandler: col on 3101 failed 2. Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 3. org.apache.spark.sql.AnalysisException: Reference 'ra' is ambiguous, could be: ra#8146, ra#13501.; 2. when I joined two spark dataframes using the key: df3<-join(df1, df2, df1$ra == df2$ra, "left") 1. 16/09/09 14:48:07 WARN Column: Constructing trivially true equals predicate, 'ra#8146 = ra#8146'. Perhaps you need to use aliases. Actually column "ra" is the column name, I don't know why sparkR keeps having errors about ra#8146 or ra#13501.. Can someone help? Thanks