As the documentation says, Cache Manager is only invoked when a caching (i.e. persist) function is called by the user in the code. Therefore, giving that, as far as I understood, unless cache/persist operations are not explicitly called, the job's results (including inputs and intermediate ones) will never be stored to be reused.
I am wondering if there exist any optimization for the query execution plan that applies any implicit cache mechanism without calling the cache/persist operation. Or if there is any other mechanism that can implicitly invoke the cache for any other situation. In the case that I understood correctly, is there any strong reason why Catalyst Optimizer does not enforce any cache mechanism for the intermediate results between jobs? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org