hi all a short example before the long story:
var accumulatedDataFrame = ... // initialize for (i <- 1 to 100) { val myTinyNewData = ... // my slowly calculated new data portion in tiny amounts accumulatedDataFrame = accumulatedDataFrame.union(myTinyNewData) // how to stick here to the values of accumulatedDataFrame only and forget definitions?! } this kind of stuff is likely to get slower and slower on each iteration even if myTinyNewData is quite compact. Usually I write accumulatedDataFrame to S3 and then re-load it back to clear the definition history. It makes code ugly though. Are there any smarter way? It happens very often that a DataFrame is created via complex definitions. The DataFrame is then re-used in several places and sometimes it gets recalculated triggering a heavy cascade of operations. Of course one could use .persist or .cache modifiers, but the result is unfortunately not transparent and instead of speeding up things it results in slow-down or even lost jobs if storage resources are not enough. Any advice? best regards -- Valery