Look weird, seems spark-v1.5.x can accept the query. What's the difference between the example and your query?
#### Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 /_/ scala> :paste // Entering paste mode (ctrl-D to finish) val df1 = Seq((1, 0), (2, 0)).toDF("id", "A") val df2 = Seq((2, 0), (3, 0)).toDF("id", "B") df1.join(df2, df1("id") === df2("id"), "outer").show // Exiting paste mode, now interpreting. +----+----+----+----+ | id| A| id| B| +----+----+----+----+ | 1| 0|null|null| | 2| 0| 2| 0| |null|null| 3| 0| +----+----+----+----+ df1: org.apache.spark.sql.DataFrame = [id: int, A: int] df2: org.apache.spark.sql.DataFrame = [id: int, B: int] On Wed, May 18, 2016 at 3:52 PM, ram kumar <ramkumarro...@gmail.com> wrote: > I tried > df1.join(df2, df1("id") === df2("id"), "outer").show > > But there is a duplicate "id" and when I query the "id", I get > *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is > *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) > > I am currently using spark 1.5.2. > Is there any alternative way in 1.5 > > Thanks > > On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Also, you can pass the query that you'd like to use in spark-v1.6+; >> >> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A") >> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B") >> df1.join(df2, df1("id") === df2("id"), "outer").show >> >> // maropu >> >> >> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com> >> wrote: >> >>> If I run as >>> val rs = s.join(t,"time_id").join(c,"channel_id") >>> >>> It takes as inner join. >>> >>> >>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> pretty simple, a similar construct to tables projected as DF >>>> >>>> val c = >>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") >>>> val t = >>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") >>>> val rs = s.join(t,"time_id").join(c,"channel_id") >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: >>>> >>>>> Hi, >>>>> >>>>> Try this one: >>>>> >>>>> >>>>> df_join = df1.*join*(df2, 'Id', "fullouter") >>>>> >>>>> Thanks, >>>>> Bijay >>>>> >>>>> >>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I tried to join two dataframe >>>>>> >>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>>>>> >>>>>> df_join.registerTempTable("join_test") >>>>>> >>>>>> >>>>>> When querying "Id" from "join_test" >>>>>> >>>>>> 0: jdbc:hive2://> *select Id from join_test;* >>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>>>>> 0: jdbc:hive2://> >>>>>> >>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one >>>>>> "Id" >>>>>> >>>>>> Thanks >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- --- Takeshi Yamamuro