[
https://issues.apache.org/jira/browse/SPARK-20683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-20683:
---------------------------------
Labels: bulk-closed (was: )
> Make table uncache chaining optional
> ------------------------------------
>
> Key: SPARK-20683
> URL: https://issues.apache.org/jira/browse/SPARK-20683
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.1
> Environment: Not particularly environment sensitive.
> Encountered/tested on Linux and Windows.
> Reporter: Shea Parkes
> Priority: Major
> Labels: bulk-closed
>
> A recent change was made in SPARK-19765 that causes table uncaching to chain.
> That is, if table B is a child of table A, and they are both cached, now
> uncaching table A will automatically uncache table B.
> At first I did not understand the need for this, but when reading the unit
> tests, I see that it is likely that many people do not keep named references
> to the child table (e.g. B). Perhaps B is just made and cached as some part
> of data exploration. In that situation, it makes sense for B to
> automatically be uncached when you are finished with A.
> However, we commonly utilize a different design pattern that is now harmed by
> this automatic uncaching. It is common for us to cache table A to then make
> two, independent children tables (e.g. B and C). Once those two child tables
> are realized and cached, we'd then uncache table A (as it was no longer
> needed and could be quite large). After this change now, when we uncache
> table A, we suddenly lose our cached status on both table B and C (which is
> quite frustrating). All of these tables are often quite large, and we view
> what we're doing as mindful memory management. We are maintaining named
> references to B and C at all times, so we can always uncache them ourselves
> when it makes sense.
> Would it be acceptable/feasible to make this table uncache chaining optional?
> I would be fine if the default is for the chaining to happen, as long as we
> can turn it off via parameters.
> If acceptable, I can try to work towards making the required changes. I am
> most comfortable in Python (and would want the optional parameter surfaced in
> Python), but have found the places required to make this change in Scala
> (since I reverted the functionality in a private fork already). Any help
> would be greatly appreciated however.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]