[ https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095187#comment-16095187 ]
Roberto Mirizzi commented on SPARK-21478: ----------------------------------------- That's weird you are not able to reproduce it. Did you just launch the spark-shell and copied/pasted the above commands? I tried on Spark 2.1.0, 2.1.1 and 2.2.0, both on AWS and on my local machine. Spark 2.1.0 doesn't exhibit any issue, Spark 2.1.1 and 2.2.0 fail the last assertion. About your point "I do think one wants to be able to persist the result and not the original though", it depends on the specific use case. My example was a trivial one to reproduce the problem, but as you can imagine you may want to persist the first DF, do a bunch of operations reusing it, and then generate a new DF, persist it, and unpersist the old one when you don't need it anymore. It looks like a serious problem to me. This is my entire output: {code:java} $ spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/07/20 11:44:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/07/20 11:44:44 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://10.15.16.46:4040 Spark context available as 'sc' (master = local[*], app id = local-1500576281010). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_71) Type in expressions to have them evaluated. Type :help for more information. scala> val x1 = Seq(1).toDF() x1: org.apache.spark.sql.DataFrame = [value: int] scala> x1.persist() res0: x1.type = [value: int] scala> x1.count() res1: Long = 1 scala> assert(x1.storageLevel.useMemory) scala> scala> val x11 = x1.select($"value" * 2) x11: org.apache.spark.sql.DataFrame = [(value * 2): int] scala> x11.persist() res3: x11.type = [(value * 2): int] scala> x11.count() res4: Long = 1 scala> assert(x11.storageLevel.useMemory) scala> scala> x1.unpersist() res6: x1.type = [value: int] scala> scala> assert(!x1.storageLevel.useMemory) scala> //the following assertion FAILS scala> assert(x11.storageLevel.useMemory) java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) ... 48 elided scala> {code} > Unpersist a DF also unpersists related DFs > ------------------------------------------ > > Key: SPARK-21478 > URL: https://issues.apache.org/jira/browse/SPARK-21478 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.2.0 > Reporter: Roberto Mirizzi > > Starting with Spark 2.1.1 I observed this bug. Here's are the steps to > reproduce it: > # create a DF > # persist it > # count the items in it > # create a new DF as a transformation of the previous one > # persist it > # count the items in it > # unpersist the first DF > Once you do that you will see that also the 2nd DF is gone. > The code to reproduce it is: > {code:java} > val x1 = Seq(1).toDF() > x1.persist() > x1.count() > assert(x1.storageLevel.useMemory) > val x11 = x1.select($"value" * 2) > x11.persist() > x11.count() > assert(x11.storageLevel.useMemory) > x1.unpersist() > assert(!x1.storageLevel.useMemory) > //the following assertion FAILS > assert(x11.storageLevel.useMemory) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org