[
https://issues.apache.org/jira/browse/SPARK-43408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-43408.
----------------------------------
Resolution: Invalid
> Spark caching in the context of a single job
> --------------------------------------------
>
> Key: SPARK-43408
> URL: https://issues.apache.org/jira/browse/SPARK-43408
> Project: Spark
> Issue Type: Question
> Components: Shuffle
> Affects Versions: 3.3.1
> Reporter: Faiz Halde
> Priority: Trivial
>
> Does caching benefit a spark job with only a single action in it? Spark IIRC
> already optimizes shuffles by persisting them onto the disk
> I am unable to find a counter-example where caching would benefit a job with
> a single action. In every case I can think of, the shuffle checkpoint acts as
> a good enough caching mechanism in itself
> FWIW, I am talking specifically in the context of the Dataframe API. The
> StorageLevel allowed in my case is DISK_ONLY i.e. I am not looking to speed
> up by caching data in memory
> To rephrase, is DISK_ONLY caching better or same as shuffle checkpointing in
> the context of a single action
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]