[ 
https://issues.apache.org/jira/browse/SPARK-11161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961501#comment-14961501
 ] 

Ryan Williams commented on SPARK-11161:
---------------------------------------

Digging in to the code a bit, I determined that this likely has to do with the 
timing of driver GC cycles and is specific to RDDs which I am not retaining a 
reference to in my {{spark-shell}}.

Is it intended that cached RDDs be unpersisted when user code no longer retains 
a reference to them? That appears to be a necessary condition for what I'm 
observing.

> Viewing the web UI for the first time unpersists a cached RDD
> -------------------------------------------------------------
>
>                 Key: SPARK-11161
>                 URL: https://issues.apache.org/jira/browse/SPARK-11161
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Web UI
>    Affects Versions: 1.5.1
>            Reporter: Ryan Williams
>            Priority: Minor
>
> This one is a real head-scratcher. [Here's a 
> screencast|http://f.cl.ly/items/0P0N413t1V3j2B0A3V1a/Screen%20Recording%202015-10-16%20at%2005.43%20PM.gif]:
> !http://f.cl.ly/items/0P0N413t1V3j2B0A3V1a/Screen%20Recording%202015-10-16%20at%2005.43%20PM.gif!
> The three windows, left-to-right, are: 
> * a {{spark-shell}} on YARN with dynamic allocation enabled, at rest with one 
> executor. [Here's an example app's 
> environment|https://gist.github.com/ryan-williams/6dd3502d5d0de2f030ac].
> * [Spree|https://github.com/hammerlab/spree], opened to the above app's 
> "Storage" tab.
> * my YARN resource manager, showing a link to the web UI running on the 
> driver.
> At the start, nothing has been run in the shell, and I've not visited the web 
> UI.
> I run a simple job in the shell and cache a small RDD that it computes:
> {code}
> sc.parallelize(1 to 100000000, 100).map(_ % 100 -> 1).reduceByKey(_+_, 
> 100).setName("foo").cache.count
> {code}
> As the second stage runs, you can see the partitions show up as cached in 
> Spree.
> After the job finishes, a few requested executors continue to fill in, which 
> you can see in the console at left or the nav bar of Spree in the middle.
> Once that has finished, everything is at rest with the RDD "foo" 100% cached.
> Then, I click the YARN RM's "ApplicationMaster" link which loads the web UI 
> on the driver for the first time.
> Immediately, the console prints some activity, including that RDD 2 has been 
> removed:
> {code}
> 15/10/16 21:43:12 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on 172.29.46.15:33156 in memory (size: 1517.0 B, free: 7.2 GB)
> 15/10/16 21:43:12 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on demeter-csmaz10-17.demeter.hpc.mssm.edu:56997 in memory (size: 1517.0 B, 
> free: 12.2 GB)
> 15/10/16 21:43:13 INFO spark.ContextCleaner: Cleaned accumulator 2
> 15/10/16 21:43:13 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 
> on 172.29.46.15:33156 in memory (size: 1666.0 B, free: 7.2 GB)
> 15/10/16 21:43:13 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 
> on demeter-csmaz10-17.demeter.hpc.mssm.edu:56997 in memory (size: 1666.0 B, 
> free: 12.2 GB)
> 15/10/16 21:43:13 INFO spark.ContextCleaner: Cleaned accumulator 1
> 15/10/16 21:43:13 INFO spark.ContextCleaner: Cleaned shuffle 0
> 15/10/16 21:43:13 INFO storage.BlockManager: Removing RDD 2
> 15/10/16 21:43:13 INFO spark.ContextCleaner: Cleaned RDD 2
> {code}
> Accordingly, Spree shows that the RDD has been unpersisted, and I can see in 
> the event log (not pictured in the screencast) that an Unpersist event has 
> made its way through the various SparkListeners:
> {code}
> {"Event":"SparkListenerUnpersistRDD","RDD ID":2}
> {code}
> Simply loading the web UI causes an RDD unpersist event to fire!
> I can't nail down exactly what's causing this, and I've seen evidence that 
> there are other sequences of events that can also cause it:
> * I've repro'd the above steps ~20 times. The RDD always gets unpersisted 
> when I've not visited the web UI until the RDD is cached, and when the app is 
> dynamically allocating executors.
> * One time, I observed the unpersist to fire without my even visiting the web 
> UI at all. Other times I wait a long time before visiting the web UI, so that 
> it is clear that the loading of the web UI is causal, and it always is, but 
> apparently there's another way for the unpersist to happen, seemingly rarely, 
> without visiting the web UI.
> * I tried a couple of times without dynamic allocation and could not 
> reproduce it.
> * I've tried a couple of times with dynamic allocation and starting with a 
> higher minimum number of executors than 1 and have been unable to reproduce 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to