Hi, If you join logic is correct, it seems to be a similar issue which i faced recently
Can you try by *SparkContext(conf).set("spark.driver.allowMultipleContexts","true")* Regards, Satish Chandra On Mon, Aug 24, 2015 at 2:51 PM, Ilya Karpov <i.kar...@cleverdata.ru> wrote: > Hi, guys > I'm confused about joining columns in SparkSQL and need your advice. > I want to join 2 datasets of profiles. Each profile has name and array of > attributes(age, gender, email etc). > There can be mutliple instances of attribute with the same name, e.g. > profile has 2 emails - so 2 attributes with name = 'email' in > array. Now I want to join 2 datasets using 'email' attribute. I cant find > the way to do it :( > > The code is below. Now result of join is empty, while I expect to see 1 > row with all Alice emails. > > import org.apache.spark.sql.{DataFrame, SQLContext} > import org.apache.spark.{SparkConf, SparkContext} > > case class Attribute(name: String, value: String, weight: Float) > case class Profile(name: String, attributes: Seq[Attribute]) > > object SparkJoinArrayColumn { > def main(args: Array[String]) { > val sc: SparkContext = new SparkContext(new > SparkConf().setMaster("local").setAppName(getClass.getSimpleName)) > val sqlContext: SQLContext = new SQLContext(sc) > > import sqlContext.implicits._ > > val a: DataFrame = sc.parallelize(Seq( > Profile("Alice", Seq(Attribute("email", "al...@mail.com", 1.0f), > Attribute("email", "a.jo...@mail.com", 1.0f))) > )).toDF.as("a") > > val b: DataFrame = sc.parallelize(Seq( > Profile("Alice", Seq(Attribute("email", "al...@mail.com", 1.0f), > Attribute("age", "29", 0.2f))) > )).toDF.as("b") > > > a.where($"a.attributes.name" === "email") > .join( > b.where($"b.attributes.name" === "email"), > $"a.attributes.value" === $"b.attributes.value" > ) > .show() > } > } > > Thanks forward! > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >