Barry Becker created SPARK-19699:
------------------------------------
Summary: createOrReplaceTable does not always replace an existing
table of the same name
Key: SPARK-19699
URL: https://issues.apache.org/jira/browse/SPARK-19699
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.1.0
Reporter: Barry Becker
Priority: Minor
There are cases when dataframe.createOrReplaceTempView does not replace an
existing table with the same name.
Please also refer to my [related stack-overflow
post|http://stackoverflow.com/questions/42371690/in-spark-2-1-how-come-the-dataframe-createoreplacetemptable-does-not-replace-an].
To reproduce, do
{code}
df.collect()
df.createOrReplaceTempView("foo1")
df.sqlContext.cacheTable("foo1")
{code}
with one dataframe, and then do exactly the same thing with a different
dataframe. Then look in the storage tab in the spark UI and see multiple
entries for "foo1" in the "RDD Name" column.
Maybe I am misunderstanding, but this causes 2 apparent problems
1) How do you know which table will be retrieved with sqlContext.table("foo1") ?
2) The duplicate entries represent a memory leak.
I have tried calling dropTempTable(existingName) first, but then have
occasionally seen a FAILFAST error when trying to use the table. It's as if the
dropTempTable is not synchronous, but maybe I am doing something wrong.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]