Spark Streaming (DStreams) wasnt designed keeping event-time in mind. Instead, we have designed Structured Streaming to naturally deal with event time. You should check that out. Here are the pointers.
- Programming guide - http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html - Blog posts 1. https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html 2. https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html - Talk - https://spark-summit.org/2016/events/a-deep-dive-into-structured-streaming/ On Sat, Jan 28, 2017 at 7:05 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I read through the documentation on Spark Streaming based on event time > and how spark handles lags w.r.t processing time and so on.. but what if > the lag is too long between the event time and processing time? other words > what should I do if I am receiving yesterday's data (the timestamp on > message shows yesterday date and time but the processing time is today's > time) ? And say I also have a dashboard I want to update in realtime ( as > in whenever I get the data) which shows past 5 days worth of data and > dashboard just keeps updating. > > Thanks, > kant > >