Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo <prem.re...@gmail.com> wrote: > > > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapRFS but for this > Spark will execute the full DAG again (recompute all the previous steps)(all > the read + transformations ). > > I don't want this recompute again so I decided to cache() the dataframe so > that 2nd/nth write won't recompute all the steps . > > But here is a catch: the cache() takes more time to persist the data in > memory. > > I have a question when the dataframe is in memory then just to save it to > another space in memory , why it will take more time (3.2 G data 6 mins) > > May I know what operations in cache() are taking such a long time ? > > I would appreciate it if someone would share the information .
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org