Tiago Albineli Motta created SPARK-16506:
--------------------------------------------

             Summary: Subsequent dataframe join dont work
                 Key: SPARK-16506
                 URL: https://issues.apache.org/jira/browse/SPARK-16506
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.2
            Reporter: Tiago Albineli Motta
            Priority: Minor


Here is the example code:

{quote}
      import sql.implicits._

      val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", 
"tres"))).toDF.selectExpr("_1 as id", "_2 as name")
      
      val rawj = sc.parallelize(Seq(("1", "2"),  ("1", "3"), ("2", "3"), ("2", 
"1"))).toDF.selectExpr("_1 as id1", "_2 as id2")
      
      val join1 = rawj.join(objs, objs("id") === rawj("id1"))
        .withColumnRenamed("id", "anything")
        
      println("works...")
      val join2a = join1.join(objs, 'id2 === 'id )
      join2a.show()
      
      println("works...")
      val join2b = objs.join(join1, objs("id") === join1("id2"))
      join2b.show()
      
      println("do not works...")
      val join2c = join1.join(objs, join1("id2") === objs("id") )
      join2c.show()
{quote}

Fisrt two joins work. But the last one gave me this error:

{quote}
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved 
attribute(s) id#2 missing from anything#8,name#14,name#3,id1#6,id2#7,id#13 in 
operator !Join Inner, Some((id2#7 = id#2));
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105)
{quote}

Without the first column rename, the error happens in silence since the join 
get empty:

{quote}
      import sql.implicits._

      val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", 
"tres"))).toDF.selectExpr("_1 as id", "_2 as name")
      
      val rawj = sc.parallelize(Seq(("1", "2"),  ("1", "3"), ("2", "3"), ("2", 
"1"))).toDF.selectExpr("_1 as id1", "_2 as id2")
      
      val join1 = rawj.join(objs, objs("id") === rawj("id1"))
      
      println("do not works...")
      val join2c = join1.join(objs, join1("id2") === objs("id") )
      join2c.show()
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to