[ 
https://issues.apache.org/jira/browse/SPARK-20683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003864#comment-16003864
 ] 

Shea Parkes commented on SPARK-20683:
-------------------------------------

For anyone that found this issue and just wants to revert to the old behavior 
in their own fork, the following change in 
{{sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala}} 
worked for us:

{code}
-      if (cd.plan.find(_.sameResult(plan)).isDefined) {
+      if (cd.plan.sameResult(plan)) {
{code}

> Make table uncache chaining optional
> ------------------------------------
>
>                 Key: SPARK-20683
>                 URL: https://issues.apache.org/jira/browse/SPARK-20683
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1
>         Environment: Not particularly environment sensitive.  
> Encountered/tested on Linux and Windows.
>            Reporter: Shea Parkes
>
> A recent change was made in SPARK-19765 that causes table uncaching to chain. 
>  That is, if table B is a child of table A, and they are both cached, now 
> uncaching table A will automatically uncache table B.
> At first I did not understand the need for this, but when reading the unit 
> tests, I see that it is likely that many people do not keep named references 
> to the child table (e.g. B).  Perhaps B is just made and cached as some part 
> of data exploration.  In that situation, it makes sense for B to 
> automatically be uncached when you are finished with A.
> However, we commonly utilize a different design pattern that is now harmed by 
> this automatic uncaching.  It is common for us to cache table A to then make 
> two, independent children tables (e.g. B and C).  Once those two child tables 
> are realized and cached, we'd then uncache table A (as it was no longer 
> needed and could be quite large).  After this change now, when we uncache 
> table A, we suddenly lose our cached status on both table B and C (which is 
> quite frustrating).  All of these tables are often quite large, and we view 
> what we're doing as mindful memory management.  We are maintaining named 
> references to B and C at all times, so we can always uncache them ourselves 
> when it makes sense.
> Would it be acceptable/feasible to make this table uncache chaining optional? 
>  I would be fine if the default is for the chaining to happen, as long as we 
> can turn it off via parameters.
> If acceptable, I can try to work towards making the required changes.  I am 
> most comfortable in Python (and would want the optional parameter surfaced in 
> Python), but have found the places required to make this change in Scala 
> (since I reverted the functionality in a private fork already).  Any help 
> would be greatly appreciated however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to