Hi,

I am sorry I made a really bad Typo. What I meant in my email was actually
structured streaming so I wish I could do s/Spark Streaming/Structured
Streaming/g. Thanks for the pointers looks like what I was looking for is
actually watermarking since my question is all about what I should do if my
data is 24 hours late or in general what I should do if my data is late for
longer periods of time. Watermarking in Spark looks very new concept so let
me do that reading!

Thanks,
kant

On Sun, Jan 29, 2017 at 6:38 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Spark Streaming (DStreams) wasnt designed keeping event-time in mind.
> Instead, we have designed Structured Streaming to naturally deal with event
> time. You should check that out. Here are the pointers.
>
> - Programming guide - http://spark.apache.org/docs/latest/structured-
> streaming-programming-guide.html
> - Blog posts
>    1. https://databricks.com/blog/2016/07/28/continuous-
> applications-evolving-streaming-in-apache-spark-2-0.html
>    2. https://databricks.com/blog/2016/07/28/structured-
> streaming-in-apache-spark.html
> - Talk - https://spark-summit.org/2016/events/a-deep-dive-into-
> structured-streaming/
>
> On Sat, Jan 28, 2017 at 7:05 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi All,
>>
>> I read through the documentation on Spark Streaming based on event time
>> and how spark handles lags w.r.t processing time and so on.. but what if
>> the lag is too long between the event time and processing time? other words
>> what should I do if I am receiving yesterday's data (the timestamp on
>> message shows yesterday date and time but the processing time is today's
>> time) ? And say I also have a dashboard I want to update in realtime ( as
>> in whenever I get the data) which shows past 5 days worth of data and
>> dashboard just keeps updating.
>>
>> Thanks,
>> kant
>>
>>
>

Reply via email to