Hi,

What would you expect? The data is simply dropped as that's the purpose of
watermarking it. That's my understanding at least.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

On Mon, Feb 5, 2018 at 8:11 PM, M Singh <mans2si...@yahoo.com> wrote:

> Just checking if anyone has more details on how watermark works in cases
> where event time is earlier than processing time stamp.
>
>
> On Friday, February 2, 2018 8:47 AM, M Singh <mans2si...@yahoo.com> wrote:
>
>
> Hi Vishu/Jacek:
>
> Thanks for your responses.
>
> Jacek - At the moment, the current time for my use case is processing time.
>
> Vishnu - Spark documentation (https://spark.apache.org/
> docs/latest/structured-streaming-programming-guide.html) does indicate
> that it can dedup using watermark.  So I believe there are more use cases
> for watermark and that is what I am trying to find.
>
> I am hoping that TD can clarify or point me to the documentation.
>
> Thanks
>
>
> On Wednesday, January 31, 2018 6:37 AM, Vishnu Viswanath <
> vishnu.viswanat...@gmail.com> wrote:
>
>
> Hi Mans,
>
> Watermark is Spark is used to decide when to clear the state, so if the
> even it delayed more than when the state is cleared by Spark, then it will
> be ignored.
> I recently wrote a blog post on this : http://vishnuviswanath.com/
> spark_structured_streaming.html#watermark
>
> Yes, this State is applicable for aggregation only. If you are having only
> a map function and don't want to process it, you could do a filter based on
> its EventTime field, but I guess you will have to compare it with the
> processing time since there is no API to access Watermark by the user.
>
> -Vishnu
>
> On Fri, Jan 26, 2018 at 1:14 PM, M Singh <mans2si...@yahoo.com.invalid>
> wrote:
>
> Hi:
>
> I am trying to filter out records which are lagging behind (based on event
> time) by a certain amount of time.
>
> Is the watermark api applicable to this scenario (ie, filtering lagging
> records) or it is only applicable with aggregation ?  I could not get a
> clear understanding from the documentation which only refers to it's usage
> with aggregation.
>
> Thanks
>
> Mans
>
>
>
>
>
>
>

Reply via email to