Re: Error joining dataframes

2016-05-18 Thread ram kumar
I tried it, eg: df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter") +++++ | id| A| id| B| +++++ | 1| 0|null|null| | 2| 0| 2| 0| |null|null| 3| 0| +++++ if I try, df_join = df1.join(df2,df1( "Id")

Re: Error joining dataframes

2016-05-18 Thread Divya Gehlot
Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id")) On May 18, 2016 2:16 PM, "ram kumar" wrote: I tried scala> var df_join = df1.join(df2, "Id", "fullouter") :27: error: type mismatch; found : String("Id") required:

Re: Error joining dataframes

2016-05-18 Thread Takeshi Yamamuro
Ah, yes. `df_join` has the two `id`, so you need to select which id you use; scala> :paste // Entering paste mode (ctrl-D to finish) val df1 = Seq((1, 0), (2, 0)).toDF("id", "A") val df2 = Seq((2, 0), (3, 0)).toDF("id", "B") val df3 = df1.join(df2, df1("id") === df2("id"), "outer")

Re: Error joining dataframes

2016-05-18 Thread ram kumar
When you register a temp table from the dataframe eg: var df_join = df1.join(df2, df1("id") === df2("id"), "outer") df_join.registerTempTable("test") sqlContext.sql("select * from test") +++++ | id| A| id| B| +++++ | 1| 0|null|null| | 2| 0| 2|

Re: Error joining dataframes

2016-05-18 Thread Takeshi Yamamuro
Look weird, seems spark-v1.5.x can accept the query. What's the difference between the example and your query? Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 /_/ scala> :paste //

Re: Error joining dataframes

2016-05-18 Thread ram kumar
I tried df1.join(df2, df1("id") === df2("id"), "outer").show But there is a duplicate "id" and when I query the "id", I get *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) I am currently using spark 1.5.2. Is

Re: Error joining dataframes

2016-05-18 Thread ram kumar
If I run as val rs = s.join(t,"time_id").join(c,"channel_id") It takes as inner join. On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh wrote: > pretty simple, a similar construct to tables projected as DF > > val c =

Re: Error joining dataframes

2016-05-18 Thread Takeshi Yamamuro
You can use the api in spark-v1.6+. https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L454 // maropu On Wed, May 18, 2016 at 3:16 PM, ram kumar wrote: > I tried > > scala> var df_join = df1.join(df2, "Id",

Re: Error joining dataframes

2016-05-18 Thread ram kumar
I tried scala> var df_join = df1.join(df2, "Id", "fullouter") :27: error: type mismatch; found : String("Id") required: org.apache.spark.sql.Column var df_join = df1.join(df2, "Id", "fullouter") ^ scala> And I cant see the above method in

Re: Error joining dataframes

2016-05-17 Thread Mich Talebzadeh
pretty simple, a similar construct to tables projected as DF val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") val rs = s.join(t,"time_id").join(c,"channel_id") HTH Dr Mich Talebzadeh LinkedIn

Re: Error joining dataframes

2016-05-17 Thread Bijay Kumar Pathak
Hi, Try this one: df_join = df1.*join*(df2, 'Id', "fullouter") Thanks, Bijay On Tue, May 17, 2016 at 9:39 AM, ram kumar wrote: > Hi, > > I tried to join two dataframe > > df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") > >