1. Yes. All times in event time, not processing time. So you may get 10AM event time data at 11AM processing time, but it will still be compared again all data within 9-10AM event times.
2. Show us your code. On Thu, Feb 27, 2020 at 2:30 AM lec ssmi <shicheng31...@gmail.com> wrote: > Hi: > I'm new to structured streaming. Because the built-in API cannot > perform the Count Distinct operation of Window, I want to use > dropDuplicates first, and then perform the window count. > But in the process of using, there are two problems: > 1. Because it is streaming computing, in the process of > deduplication, the state needs to be cleared in time, which requires the > cooperation of watermark. Assuming my event time field is consistently > increasing, and I set the watermark to 1 hour, does it mean > that the data at 10 o'clock will only be compared in these data from 9 > o'clock to 10 o'clock, and the data before 9 o'clock will be cleared ? > 2. Because it is window deduplication, I set the watermark > before deduplication to the window size.But after deduplication, I need to > call withWatermark () again to set the watermark to the real > watermark. Will setting the watermark again take effect? > > Thanks a lot ! >