[
https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiao Li resolved SPARK-21478.
-----------------------------
Resolution: Not A Problem
The current cache design requires the query correctness. If you want to keep
the intermediate data, even if the data is stale. You need to materialize it by
saving it as a table.
Thanks for reporting it. We might need to clarify it in the document.
> Unpersist a DF also unpersists related DFs
> ------------------------------------------
>
> Key: SPARK-21478
> URL: https://issues.apache.org/jira/browse/SPARK-21478
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.1, 2.2.0
> Reporter: Roberto Mirizzi
>
> Starting with Spark 2.1.1 I observed this bug. Here's are the steps to
> reproduce it:
> # create a DF
> # persist it
> # count the items in it
> # create a new DF as a transformation of the previous one
> # persist it
> # count the items in it
> # unpersist the first DF
> Once you do that you will see that also the 2nd DF is gone.
> The code to reproduce it is:
> {code:java}
> val x1 = Seq(1).toDF()
> x1.persist()
> x1.count()
> assert(x1.storageLevel.useMemory)
> val x11 = x1.select($"value" * 2)
> x11.persist()
> x11.count()
> assert(x11.storageLevel.useMemory)
> x1.unpersist()
> assert(!x1.storageLevel.useMemory)
> //the following assertion FAILS
> assert(x11.storageLevel.useMemory)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]