Barry Becker created SPARK-19699:
------------------------------------

             Summary: createOrReplaceTable does not always replace an existing 
table of the same name
                 Key: SPARK-19699
                 URL: https://issues.apache.org/jira/browse/SPARK-19699
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0
            Reporter: Barry Becker
            Priority: Minor


There are cases when dataframe.createOrReplaceTempView does not replace an 
existing table with the same name.
Please also refer to my [related stack-overflow 
post|http://stackoverflow.com/questions/42371690/in-spark-2-1-how-come-the-dataframe-createoreplacetemptable-does-not-replace-an].

To reproduce, do
{code}
df.collect()
df.createOrReplaceTempView("foo1")
df.sqlContext.cacheTable("foo1")
{code}

with one dataframe, and then do exactly the same thing with a different 
dataframe. Then look in the storage tab in the spark UI and see multiple 
entries for "foo1" in the "RDD Name" column.

Maybe I am misunderstanding, but this causes 2 apparent problems
1) How do you know which table will be retrieved with sqlContext.table("foo1") ?
2) The duplicate entries represent a memory leak. 
  I have tried calling dropTempTable(existingName) first, but then have 
occasionally seen a FAILFAST error when trying to use the table. It's as if the 
dropTempTable is not synchronous, but maybe I am doing something wrong.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to