Hi, I'm building a system for near real-time data analytics. My plan is to have an ETL batch job which calculates aggregations running periodically. User queries are then parsed for on-demand calculations, also in Spark. Where are the pre-calculated results supposed to be saved? I mean, after finishing aggregations, the ETL job will terminate, so caches are wiped out of memory. How can I use these results to calculate on-demand queries? Or more generally, could you please give me a good way to organize the data flow and jobs in order to achieve this?
I'm new to Spark so sorry if this might sound like a dumb question. Thank you. Huy -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Where-to-save-intermediate-results-tp13062.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org