[
https://issues.apache.org/jira/browse/SPARK-21216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063662#comment-16063662
]
Apache Spark commented on SPARK-21216:
--------------------------------------
User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/18426
> Streaming DataFrames fail to join with Hive tables
> --------------------------------------------------
>
> Key: SPARK-21216
> URL: https://issues.apache.org/jira/browse/SPARK-21216
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.1.1
> Reporter: Burak Yavuz
> Assignee: Burak Yavuz
>
> The following code will throw a cryptic exception:
> {code}
> import org.apache.spark.sql.execution.streaming.MemoryStream
> import testImplicits._
> implicit val _sqlContext = spark.sqlContext
> Seq((1, "one"), (2, "two"), (4, "four")).toDF("number",
> "word").createOrReplaceTempView("t1")
> // Make a table and ensure it will be broadcast.
> sql("""CREATE TABLE smallTable(word string, number int)
> |ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> |STORED AS TEXTFILE
> """.stripMargin)
> sql(
> """INSERT INTO smallTable
> |SELECT word, number from t1
> """.stripMargin)
> val inputData = MemoryStream[Int]
> val joined = inputData.toDS().toDF()
> .join(spark.table("smallTable"), $"value" === $"number")
> val sq = joined.writeStream
> .format("memory")
> .queryName("t2")
> .start()
> try {
> inputData.addData(1, 2)
> sq.processAllAvailable()
> } finally {
> sq.stop()
> }
> {code}
> If someone creates a HiveSession, the planner in `IncrementalExecution`
> doesn't take into account the Hive scan strategies
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]