Yun Zhao created SPARK-10838:
--------------------------------

             Summary: Repeat to join one DataFrame twice,there will be 
AnalysisException.
                 Key: SPARK-10838
                 URL: https://issues.apache.org/jira/browse/SPARK-10838
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.4.1
            Reporter: Yun Zhao


The detail of exception is:
{quote}
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved 
attribute(s) col_a#1 missing from col_a#0,col_b#2,col_a#3,col_b#4 in operator 
!Join Inner, Some((col_b#2 = col_a#1));
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:154)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:49)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:908)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:132)
        at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
        at org.apache.spark.sql.DataFrame.join(DataFrame.scala:554)
        at org.apache.spark.sql.DataFrame.join(DataFrame.scala:521)
{quote}

The related codes are:
{quote}
object DFJoinTest extends App {

  case class Foo(col_a: String)

  case class Bar(col_a: String, col_b: String)

  val sc = new SparkContext(new 
SparkConf().setMaster("local").setAppName("DFJoinTest"))
  val sqlContext = new SQLContext(sc)

  import sqlContext.implicits._

  val df1 = sc.parallelize(Array("1")).map(_.split(",")).map(p => 
Foo(p(0))).toDF()
  val df2 = sc.parallelize(Array("1,1")).map(_.split(",")).map(p => Bar(p(0), 
p(1))).toDF()

  val df3 = df1.join(df2, df1("col_a") === df2("col_a")).select(df1("col_a"), 
$"col_b")

  df3.join(df2, df3("col_b") === df2("col_a")).show()

  //  val df4 = df2.as("df4")
  //  df3.join(df4, df3("col_b") === df4("col_a")).show()

  //  df3.join(df2.as("df4"), df3("col_b") === $"df4.col_a").show()

  sc.stop()
}
{quote}

When uses
{quote}
val df4 = df2.as("df4")
df3.join(df4, df3("col_b") === df4("col_a")).show()
{quote}
there's errors,but when uses
{quote}
df3.join(df2.as("df4"), df3("col_b") === $"df4.col_a").show()
{quote}
it's normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to