Repository: flink
Updated Branches:
  refs/heads/master d1475ee86 -> f98af8cc1


[FLINK-5375] [doc] Fix Watermark Semantics

This closes #3185.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/f98af8cc
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/f98af8cc
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/f98af8cc

Branch: refs/heads/master
Commit: f98af8cc1d997280d307a4942539957e35ee7141
Parents: d1475ee
Author: Tzu-Li (Gordon) Tai <[email protected]>
Authored: Fri Jan 20 19:12:27 2017 +0100
Committer: Tzu-Li (Gordon) Tai <[email protected]>
Committed: Tue Jan 24 17:55:15 2017 +0800

----------------------------------------------------------------------
 docs/dev/event_time.md                  | 18 ++++++++++--------
 docs/monitoring/debugging_event_time.md |  2 +-
 2 files changed, 11 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/f98af8cc/docs/dev/event_time.md
----------------------------------------------------------------------
diff --git a/docs/dev/event_time.md b/docs/dev/event_time.md
index 0f6c730..83151da 100644
--- a/docs/dev/event_time.md
+++ b/docs/dev/event_time.md
@@ -57,7 +57,7 @@ Flink supports different notions of *time* in streaming 
programs.
     which is the mechanism that signals time progress in event time. The 
mechanism is
     described below.
 
-    Event time processing often incurs a certain latency, due to it nature of 
waiting a certain time for
+    Event time processing often incurs a certain latency, due to its nature of 
waiting a certain time for
     late events and out-of-order events. Because of that, event time programs 
are often combined with
     *processing time* operations.
 
@@ -66,7 +66,7 @@ Flink supports different notions of *time* in streaming 
programs.
     refer to that timestamp.
 
     *Ingestion Time* sits conceptually in between *Event Time* and *Processing 
Time*. Compared to
-    *Processing Time*, it is slightly more expensive, but gives more 
predictable results: Because
+    *Processing Time*, it is slightly more expensive, but gives more 
predictable results: because
     *Ingestion Time* uses stable timestamps (assigned once at the source), 
different window operations
     over the records will refer to the same timestamp, whereas in *Processing 
Time* each window operator
     may assign the record to a different window (based on the local system 
clock and any transport delay).
@@ -131,8 +131,9 @@ stream
 </div>
 
 
-Note that in order to run this example in *Event Time*, the program needs to 
use either an event time
-source, or inject a *Timestamp Assigner & Watermark Generator*. Those 
functions describe how to access
+Note that in order to run this example in *Event Time*, the program needs to 
use either sources
+that directly define event time for the data and emits Watermarks themselves, 
or
+inject a *Timestamp Assigner & Watermark Generator* after the sources. Those 
functions describe how to access
 the event timestamps, and what timely out-of-orderness the event stream 
exhibits.
 
 The section below describes the general mechanism behind *Timestamps* and 
*Watermarks*. For a guide on how
@@ -142,7 +143,7 @@ to use timestamp assignment and watermark generation in the 
Flink DataStream API
 
 # Event Time and Watermarks
 
-*Note: Flink implements many techniques from the Dataflow Model. For a good 
introduction to Event Time and, have also a look at these articles*
+*Note: Flink implements many techniques from the Dataflow Model. For a good 
introduction to Event Time and Watermarks, have also a look at the below 
articles.*
 
   - [Streaming 
101](https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101) by 
Tyler Akidau
   - The [Dataflow Model 
paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf)
@@ -152,7 +153,7 @@ A stream processor that supports *event time* needs a way 
to measure the progres
 For example, a window operator that builds hourly windows needs to be notified 
when event time has reached the
 next full hour, such that the operator can close the next window.
 
-*Event Time* can progress independently of *Processing Time* (measures by wall 
clocks).
+*Event Time* can progress independently of *Processing Time* (measured by wall 
clocks).
 For example, in one program, the current *event time* of an operator can trail 
slightly behind the processing time
 (accounting for a delay in receiving the latest elements) and both proceed at 
the same speed. In another streaming
 program, which reads fast-forward through some data already buffered in a 
Kafka topic (or another message queue), event time
@@ -162,7 +163,8 @@ can progress by weeks in seconds.
 
 The mechanism in Flink to measure progress in event time is **Watermarks**.
 Watermarks flow as part of the data stream and carry a timestamp *t*. A 
*Watermark(t)* declares that event time has reached time
-*t* in that stream, meaning that all events with a timestamps *t' < t* have 
occurred.
+*t* in that stream, meaning that there should be no more elements from the 
stream with a timestamp *t' <= t* (i.e. events with timestamps
+older or equal to the watermark).
 
 The figure below shows a stream of events with (logical) timestamps, and 
watermarks flowing inline. The events are in order
 (with respect to their timestamp), meaning that watermarks are simply periodic 
markers in the stream with an in-order timestamp.
@@ -196,7 +198,7 @@ The figure below shows an example of events and watermarks 
flowing through paral
 ## Late Elements
 
 It is possible that certain elements violate the watermark condition, meaning 
that even after the *Watermark(t)* has occurred,
-more elements with timestamp *t' < t* will occur. In fact, in many real world 
setups, certain elements can be arbitrarily
+more elements with timestamp *t' <= t* will occur. In fact, in many real world 
setups, certain elements can be arbitrarily
 delayed, making it impossible to define a time when all elements of a certain 
event timestamp have occurred.
 Further more, even if the lateness can be bounded, delaying the watermarks by 
too much is often not desirable, because it delays
 the evaluation of the event time windows by too much.

http://git-wip-us.apache.org/repos/asf/flink/blob/f98af8cc/docs/monitoring/debugging_event_time.md
----------------------------------------------------------------------
diff --git a/docs/monitoring/debugging_event_time.md 
b/docs/monitoring/debugging_event_time.md
index 6260779..eaf0d5c 100644
--- a/docs/monitoring/debugging_event_time.md
+++ b/docs/monitoring/debugging_event_time.md
@@ -28,7 +28,7 @@ under the License.
 ## Monitoring Current Event Time
 
 Flink's [event time]({{ site.baseurl }}/dev/event_time.html) and watermark 
support is a powerful feature for handling
-out-of-order events. However, its harder to understand what exactly is going 
on because the progress of time
+out-of-order events. However, it's harder to understand what exactly is going 
on because the progress of time
 is tracked within the system.
 
 There are plans (see 
[FLINK-3427](https://issues.apache.org/jira/browse/FLINK-3427)) to show the 
current low watermark

Reply via email to