Hi, Sorry to be bothering everyone on the holidays but I have found what may be a bug.
I am doing a "manual" streaming (see http://stackoverflow.com/questions/41266956/apache-spark-streaming-performance for the specific code) where I essentially read an additional dataframe each time from file, union it with previous dataframes to create a "window" and then do double aggregation on the result. Having looked at the documentation (https://spark.apache.org/docs/latest/programming-guide.html#which-storage-level-to-choose right above the headline) I expected spark to automatically cache the partial aggregation for each dataframe read and then continue with the aggregations from there. Instead it seems it reads each dataframe from file all over again. Is this a bug? Am I doing something wrong? Thanks. Assaf. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-intermidiate-results-not-being-cached-tp20358.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.