Interesting notion at https://github.com/apache/spark/pull/23650 :

.unpersist() takes an optional 'blocking' argument. If true, the call
waits until the resource is freed. Otherwise it doesn't.

The default looks pretty inconsistent:
- RDD: true
- Broadcast: true
- Dataset / DataFrame: false
- Graph (in GraphX): false
- Pyspark RDD: (no option)
- Pyspark Broadcast: false
- Pyspark DataFrame: false

I think false is a better default, as I'd expect it's much more likely
that the caller doesn't want to wait around while resources are freed,
especially as this happens on the driver. The possible downside is
that if the resources don't free up quickly, other operations might
not have as much memory available as they otherwise might have.

What about making the default false everywhere for Spark 3?
I raised it to dev@ just because that seems like a nontrivial behavior
change, but maybe it isn't controversial.

Sean

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to