Re: caching a dataframe in Spark takes lot of time

Prem Sahoo Wed, 08 May 2024 05:40:00 -0700

Could any one help me here ?
Sent from my iPhone

> On May 7, 2024, at 4:30 PM, Prem Sahoo <prem.re...@gmail.com> wrote:
> 
> 
> Hello Folks,
> in Spark I have read a file and done some transformation and finally writing 
> to hdfs.
> 
> Now I am interested in writing the same dataframe to MapRFS but for this 
> Spark will execute the full DAG again  (recompute all the previous steps)(all 
> the read + transformations ).
> 
> I don't want this recompute again so I decided to cache() the dataframe so 
> that 2nd/nth write won't recompute all the steps .
> 
> But here is a catch: the cache() takes more time to persist the data in 
> memory.
> 
> I have a question when the dataframe is in memory then just to save it to 
> another space in memory , why it will take more time (3.2 G data 6 mins)
> 
> May I know what operations in cache() are taking such a long time ?
> 
> I would appreciate it if someone would share the information .


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: caching a dataframe in Spark takes lot of time

Reply via email to