[
https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106212#comment-16106212
]
Xiao Li commented on SPARK-21478:
---------------------------------
This is by design. We do not want to use the invalid cached data. See the
original PR https://github.com/apache/spark/pull/17097 and the discussion in
the PR.
> Unpersist a DF also unpersists related DFs
> ------------------------------------------
>
> Key: SPARK-21478
> URL: https://issues.apache.org/jira/browse/SPARK-21478
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.1, 2.2.0
> Reporter: Roberto Mirizzi
>
> Starting with Spark 2.1.1 I observed this bug. Here's are the steps to
> reproduce it:
> # create a DF
> # persist it
> # count the items in it
> # create a new DF as a transformation of the previous one
> # persist it
> # count the items in it
> # unpersist the first DF
> Once you do that you will see that also the 2nd DF is gone.
> The code to reproduce it is:
> {code:java}
> val x1 = Seq(1).toDF()
> x1.persist()
> x1.count()
> assert(x1.storageLevel.useMemory)
> val x11 = x1.select($"value" * 2)
> x11.persist()
> x11.count()
> assert(x11.storageLevel.useMemory)
> x1.unpersist()
> assert(!x1.storageLevel.useMemory)
> //the following assertion FAILS
> assert(x11.storageLevel.useMemory)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]