[jira] [Resolved] (SPARK-43408) Spark caching in the context of a single job

Hyukjin Kwon (Jira) Sun, 14 May 2023 18:58:12 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-43408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-43408.
----------------------------------
    Resolution: Invalid

> Spark caching in the context of a single job
> --------------------------------------------
>
>                 Key: SPARK-43408
>                 URL: https://issues.apache.org/jira/browse/SPARK-43408
>             Project: Spark
>          Issue Type: Question
>          Components: Shuffle
>    Affects Versions: 3.3.1
>            Reporter: Faiz Halde
>            Priority: Trivial
>
> Does caching benefit a spark job with only a single action in it? Spark IIRC 
> already optimizes shuffles by persisting them onto the disk
> I am unable to find a counter-example where caching would benefit a job with 
> a single action. In every case I can think of, the shuffle checkpoint acts as 
> a good enough caching mechanism in itself
> FWIW, I am talking specifically in the context of the Dataframe API. The 
> StorageLevel allowed in my case is DISK_ONLY i.e. I am not looking to speed 
> up by caching data in memory
> To rephrase, is DISK_ONLY caching better or same as shuffle checkpointing in 
> the context of a single action



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-43408) Spark caching in the context of a single job

Reply via email to