NicoK commented on a change in pull request #11843:
URL: https://github.com/apache/flink/pull/11843#discussion_r412897634



##########
File path: docs/tutorials/streaming_analytics.md
##########
@@ -0,0 +1,460 @@
+---
+title: Streaming Analytics
+nav-id: analytics
+nav-pos: 4
+nav-title: Streaming Analytics
+nav-parent_id: tutorials
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Event Time and Watermarks
+
+### Introduction
+
+Flink explicitly supports three different notions of time:
+
+* _event time:_ the time when an event occurred, as recorded by the device 
producing (or storing) the event
+
+* _ingestion time:_ a timestamp recorded by Flink at the moment it ingests the 
event
+
+* _processing time:_ the time when a specific operator in your pipeline is 
processing the event
+
+For reproducible results, e.g., when computing the maximum price a stock 
reached during the first
+hour of trading on a given day, you should use event time. In this way the 
result won't depend on
+when the calculation is performed. This kind of real-time application is 
sometimes performed using
+processing time, but then the results are determined by the events that happen 
to be processed
+during that hour, rather than the events that occurred then. Computing 
analytics based on processing
+time causes inconsistencies, and makes it difficult to re-analyze historic 
data or test new
+implementations.
+
+### Working with Event Time
+
+By default, Flink will use processing time. To change this, you can set the 
Time Characteristic:
+
+{% highlight java %}
+final StreamExecutionEnvironment env =
+    StreamExecutionEnvironment.getExecutionEnvironment();
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
+{% endhighlight %}
+
+If you want to use event time, you will also need to supply a Timestamp 
Extractor and Watermark
+Generator that Flink will use to track the progress of event time. This will 
be covered in the
+section below on [Working with Watermarks]({{ site.baseurl }}{% link
+tutorials/streaming_analytics.md %}#working-with-watermarks), but first we 
should explain what
+watermarks are.
+
+### Watermarks
+
+Let's work through a simple example that will show why watermarks are needed, 
and how they work.
+
+In this example you have a stream of timestamped events that arrive somewhat 
out of order, as shown
+below. The numbers shown are timestamps that indicate when these events 
actually occurred. The first
+event to arrive happened at time 4, and it is followed by an event that 
happened earlier, at time 2,
+and so on:
+
+    ··· 23 19 22 24 21 14 17 13 12 15 9 11 7 2 4 →

Review comment:
       this seems to look nice (I'll apply it to your PR unless you object):
   ```
   <div class="text-center" style="font-size: x-large; word-spacing: 0.5em; 
margin: 1em 0em;">
   ··· 23 19 22 24 21 14 17 13 12 15 9 11 7 2 4 →
   </div>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to