[GitHub] [flink] NicoK commented on a change in pull request #11826: [FLINK-17236][docs] Add Tutorials section overview

GitBox Mon, 20 Apr 2020 12:39:26 -0700


NicoK commented on a change in pull request #11826:
URL: https://github.com/apache/flink/pull/11826#discussion_r411619829




##########
File path: docs/concepts/index.md
##########
@@ -27,20 +27,33 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The [Hands-on Tutorials]({{ site.baseurl }}{% link tutorials/index.md %}) 
explain the basic concepts
+of stateful and timely stream processing that underlie Flink's APIs, and 
provide examples of how
+these mechanisms are used in applications. Stateful stream processing is 
introduced in the context
+of [Data Pipelines & ETL]({{ site.baseurl }}{% link tutorials/etl.md 
%}#stateful-transformations)
+and is further developed in the section on [Fault Tolerance]({{ site.baseurl 
}}{% link
+tutorials/fault_tolerance.md %}). Timely stream processing is introduced in 
the section on
+[Streaming Analytics]({{ site.baseurl }}{% link 
tutorials/streaming_analytics.md %}).
+
+This _Concepts in Depth_ section provides a deeper understanding of how 
Flink's architecture and runtime 
+implement these concepts.
+
+## Flink's APIs
+
 Flink offers different levels of abstraction for developing streaming/batch 
applications.
 
 <img src="{{ site.baseurl }}/fig/levels_of_abstraction.svg" alt="Programming 
levels of abstraction" class="offset" width="80%" />
 
-  - The lowest level abstraction simply offers **stateful streaming**. It is
+  - The lowest level abstraction simply offers **stateful and timely stream 
processing**. It is
     embedded into the [DataStream API]({{ site.baseurl}}{% link
     dev/datastream_api.md %}) via the [Process Function]({{ site.baseurl }}{%
-    link dev/stream/operators/process_function.md %}). It allows users freely
-    process events from one or more streams, and use consistent fault tolerant
+    link dev/stream/operators/process_function.md %}). It allows users to 
freely
+    process events from one or more streams, and provides consistent, fault 
tolerant
     *state*. In addition, users can register event time and processing time
     callbacks, allowing programs to realize sophisticated computations.
 
-  - In practice, most applications would not need the above described low level
-    abstraction, but would instead program against the **Core APIs** like the
+  - In practice, many applications do not need the low level

Review comment:
       Is it rather "low-level" as an adjective? (this occurs multiple times on 
this page)

##########
File path: docs/concepts/index.md
##########
@@ -27,20 +27,33 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The [Hands-on Tutorials]({{ site.baseurl }}{% link tutorials/index.md %}) 
explain the basic concepts
+of stateful and timely stream processing that underlie Flink's APIs, and 
provide examples of how
+these mechanisms are used in applications. Stateful stream processing is 
introduced in the context
+of [Data Pipelines & ETL]({{ site.baseurl }}{% link tutorials/etl.md 
%}#stateful-transformations)
+and is further developed in the section on [Fault Tolerance]({{ site.baseurl 
}}{% link
+tutorials/fault_tolerance.md %}). Timely stream processing is introduced in 
the section on
+[Streaming Analytics]({{ site.baseurl }}{% link 
tutorials/streaming_analytics.md %}).
+
+This _Concepts in Depth_ section provides a deeper understanding of how 
Flink's architecture and runtime 
+implement these concepts.
+
+## Flink's APIs
+
 Flink offers different levels of abstraction for developing streaming/batch 
applications.
 
 <img src="{{ site.baseurl }}/fig/levels_of_abstraction.svg" alt="Programming 
levels of abstraction" class="offset" width="80%" />
 
-  - The lowest level abstraction simply offers **stateful streaming**. It is
+  - The lowest level abstraction simply offers **stateful and timely stream 
processing**. It is
     embedded into the [DataStream API]({{ site.baseurl}}{% link
     dev/datastream_api.md %}) via the [Process Function]({{ site.baseurl }}{%
-    link dev/stream/operators/process_function.md %}). It allows users freely
-    process events from one or more streams, and use consistent fault tolerant
+    link dev/stream/operators/process_function.md %}). It allows users to 
freely
+    process events from one or more streams, and provides consistent, fault 
tolerant
     *state*. In addition, users can register event time and processing time
     callbacks, allowing programs to realize sophisticated computations.
 
-  - In practice, most applications would not need the above described low level
-    abstraction, but would instead program against the **Core APIs** like the
+  - In practice, many applications do not need the low level

Review comment:
       ```suggestion
     - In practice, many applications do not need the low-level
   ```

##########
File path: docs/concepts/index.md
##########
@@ -50,8 +63,8 @@ Flink offers different levels of abstraction for developing 
streaming/batch appl
     respective programming languages.
 
     The low level *Process Function* integrates with the *DataStream API*,
-    making it possible to go the lower level abstraction for certain operations
-    only. The *DataSet API* offers additional primitives on bounded data sets,
+    making it possible to use the lower level abstraction on an as-needed 
basis. 

Review comment:
       ?
   ```suggestion
       making it possible to use the lower-level abstraction on an as-needed 
basis. 
   ```

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.

Review comment:
       add link to http://github.com/apache/flink-training ?

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.

Review comment:
       You could actually link to the glossary here, but I'm not sure whether 
it would confuse people if they actually followed the links and read the 
information there (may be too much detail there); however, it may be useful 
later.

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock

Review comment:
       nit: I usually try to spell out things like "it is" to simplify things 
for non-native speakers...
   ```suggestion
   Streams are data's natural habitat. Whether it is events from web servers, 
trades from a stock
   ```

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data

Review comment:
       don't repeat "such as" again? Maybe replace with this?
   ```suggestion
   distributed logs, like Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
   ```

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.

Review comment:
       Accessing state within Flink via a REST API? That's not really 
available, is it?

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.

Review comment:
       or mention that links to the according exercises will be available where 
needed?

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report

Review comment:
       ```suggestion
   it is possible, for example, to sort the data, compute global statistics, or 
produce a final report
   ```

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the

Review comment:
       ```suggestion
   Often there is a one-to-one correspondence between the transformations in 
the program and the
   ```

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.
+
+<img src="{{ site.baseurl }}/fig/flink-application-sources-sinks.png" 
alt="Flink application with sources and sinks" class="offset" width="90%" />
+
+### Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.
+Different operators of the same program may have different levels of
+parallelism.
+
+<img src="{{ site.baseurl }}/fig/parallel_dataflow.svg" alt="A parallel 
dataflow" class="offset" width="80%" />

Review comment:
       FYI: task vs. subtask should also be changed in the image if changed in 
text (as proposed above)

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.
+
+<img src="{{ site.baseurl }}/fig/flink-application-sources-sinks.png" 
alt="Flink application with sources and sinks" class="offset" width="90%" />
+
+### Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.

Review comment:
       Do you actually need the concept of the "parallelism of a stream"? 
Because since there could be n->m streams, I find it difficult to just use "n" 
here and also I rarely use parallelism for the stream itself...

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.
+
+<img src="{{ site.baseurl }}/fig/flink-application-sources-sinks.png" 
alt="Flink application with sources and sinks" class="offset" width="90%" />
+
+### Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.
+Different operators of the same program may have different levels of
+parallelism.
+
+<img src="{{ site.baseurl }}/fig/parallel_dataflow.svg" alt="A parallel 
dataflow" class="offset" width="80%" />
+
+Streams can transport data between two operators in a *one-to-one* (or
+*forwarding*) pattern, or in a *redistributing* pattern:
+
+  - **One-to-one** streams (for example between the *Source* and the *map()*
+    operators in the figure above) preserve the partitioning and ordering of
+    the elements. That means that subtask[1] of the *map()* operator will see
+    the same elements in the same order as they were produced by subtask[1] of
+    the *Source* operator.
+
+  - **Redistributing** streams (as between *map()* and *keyBy/window* above, as
+    well as between *keyBy/window* and *Sink*) change the partitioning of
+    streams. Each *operator subtask* sends data to different target subtasks,
+    depending on the selected transformation. Examples are *keyBy()* (which
+    re-partitions by hashing the key), *broadcast()*, or *rebalance()* (which
+    re-partitions randomly). In a *redistributing* exchange the ordering among
+    the elements is only preserved within each pair of sending and receiving
+    subtasks (for example, subtask[1] of *map()* and subtask[2] of
+    *keyBy/window*). So, for example, the redistribution between the 
keyBy/window and
+    the Sink operators shown above introduces non-determinism regarding the 
+    order in which the aggregated results for different keys arrive at the 
Sink.
+
+{% top %}
+
+## Timely Stream Processing
+
+For most streaming applications it is very valuable to be able re-process 
historic data with the
+same code that is used to process live data -- and to produce deterministic, 
consistent results,
+regardless.
+
+It can also be crucial to pay attention to the order in which events occurred, 
rather than the order
+in which they are delivered for processing, and to be able to reason about 
when a set of events is
+(or should be) complete. For example, consider the set of events involved in 
an e-commerce
+transaction, or financial trade.
+
+These requirements for timely stream processing can be met by using event time 
timestamps that are
+recorded in the data stream, rather than using the clocks of the machines 
processing the data.
+
+{% top %}
+
+## Stateful Stream Processing
+
+Flink's operations can be stateful. This means that how one event is handled 
can depend on the
+accumulated effect of all the events that came before it. State may be used 
for something simple,
+such as counting events per minute to display on a dashboard, or for something 
more complex, such as
+computing features for a fraud detection model.
+
+A Flink application is run in parallel on a distributed cluster. The various 
parallel instances of a
+given operator will execute independently, in separate threads, and in general 
will be running on
+different machines.
+
+The set of parallel instances of a stateful operator is effectively a sharded 
key-value store. Each
+parallel instance is responsible for handling events for a specific group of 
keys, and the state for
+those keys is kept locally.
+
+The diagram below shows a job running with a parallelism of two across the 
first three operators in
+the job graph, terminating in a sink that has a parallelism of one. The third 
operator is stateful,
+and you can see that a fully connected network shuffle is occurring between 
the second and third
+operators. This is being done to partition the stream by some key, so that all 
of the events that
+need to be processed together, will be.
+
+<img src="{{ site.baseurl }}/fig/parallel-job.png" alt="State is sharded" 
class="offset" width="65%" />
+
+State is always accessed locally, which helps Flink applications achieve high 
throughput and
+low-latency. You can choose to keep state on the JVM heap, or if it is too 
large, in efficiently
+organized on-disk data structures. 

Review comment:
       ```suggestion
   low-latency. You can choose to keep state on the JVM heap, or if it is too 
large, in
   efficiently-organized on-disk data structures. 
   ```

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.
+
+<img src="{{ site.baseurl }}/fig/flink-application-sources-sinks.png" 
alt="Flink application with sources and sinks" class="offset" width="90%" />
+
+### Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.
+Different operators of the same program may have different levels of
+parallelism.
+
+<img src="{{ site.baseurl }}/fig/parallel_dataflow.svg" alt="A parallel 
dataflow" class="offset" width="80%" />
+
+Streams can transport data between two operators in a *one-to-one* (or
+*forwarding*) pattern, or in a *redistributing* pattern:
+
+  - **One-to-one** streams (for example between the *Source* and the *map()*
+    operators in the figure above) preserve the partitioning and ordering of
+    the elements. That means that subtask[1] of the *map()* operator will see
+    the same elements in the same order as they were produced by subtask[1] of
+    the *Source* operator.
+
+  - **Redistributing** streams (as between *map()* and *keyBy/window* above, as
+    well as between *keyBy/window* and *Sink*) change the partitioning of
+    streams. Each *operator subtask* sends data to different target subtasks,
+    depending on the selected transformation. Examples are *keyBy()* (which
+    re-partitions by hashing the key), *broadcast()*, or *rebalance()* (which
+    re-partitions randomly). In a *redistributing* exchange the ordering among
+    the elements is only preserved within each pair of sending and receiving
+    subtasks (for example, subtask[1] of *map()* and subtask[2] of
+    *keyBy/window*). So, for example, the redistribution between the 
keyBy/window and
+    the Sink operators shown above introduces non-determinism regarding the 
+    order in which the aggregated results for different keys arrive at the 
Sink.
+
+{% top %}
+
+## Timely Stream Processing
+
+For most streaming applications it is very valuable to be able re-process 
historic data with the
+same code that is used to process live data -- and to produce deterministic, 
consistent results,
+regardless.
+
+It can also be crucial to pay attention to the order in which events occurred, 
rather than the order
+in which they are delivered for processing, and to be able to reason about 
when a set of events is
+(or should be) complete. For example, consider the set of events involved in 
an e-commerce
+transaction, or financial trade.
+
+These requirements for timely stream processing can be met by using event time 
timestamps that are
+recorded in the data stream, rather than using the clocks of the machines 
processing the data.
+
+{% top %}
+
+## Stateful Stream Processing
+
+Flink's operations can be stateful. This means that how one event is handled 
can depend on the
+accumulated effect of all the events that came before it. State may be used 
for something simple,
+such as counting events per minute to display on a dashboard, or for something 
more complex, such as
+computing features for a fraud detection model.
+
+A Flink application is run in parallel on a distributed cluster. The various 
parallel instances of a
+given operator will execute independently, in separate threads, and in general 
will be running on
+different machines.
+
+The set of parallel instances of a stateful operator is effectively a sharded 
key-value store. Each
+parallel instance is responsible for handling events for a specific group of 
keys, and the state for
+those keys is kept locally.
+
+The diagram below shows a job running with a parallelism of two across the 
first three operators in
+the job graph, terminating in a sink that has a parallelism of one. The third 
operator is stateful,
+and you can see that a fully connected network shuffle is occurring between 
the second and third
+operators. This is being done to partition the stream by some key, so that all 
of the events that
+need to be processed together, will be.
+
+<img src="{{ site.baseurl }}/fig/parallel-job.png" alt="State is sharded" 
class="offset" width="65%" />
+
+State is always accessed locally, which helps Flink applications achieve high 
throughput and
+low-latency. You can choose to keep state on the JVM heap, or if it is too 
large, in efficiently
+organized on-disk data structures. 
+
+<img src="{{ site.baseurl }}/fig/local-state.png" alt="State is local" 
class="offset" width="90%" />
+
+{% top %}
+
+## Fault Tolerance via State Snapshots
+
+Flink is able to provide fault-tolerant, exactly-once semantics through a 
combination of state
+snapshots and stream replay. These snapshots capture the entire state of the 
distributed pipeline,
+recording offsets into the input queues as well as the state throughout the 
job graph that has
+resulted from having ingested the data up to that point. When a failure 
occurs, the sources are
+rewound, the state is restored, and processing is resumed. As depicted above, 
these state snapshots
+are captured asynchronously, without impeding the ongoing processing.

Review comment:
       "Understatement of the year" ;)
   But as for the training and as an introduction, it is fine to keep away the 
details here.

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />

Review comment:
       This code example should be updated:
   * not using `keyBy("id")` based on a string - bad practice and may also be 
removed in the future
   * not using the `BucketingSink` (this line also has a rendering error)

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.
+
+<img src="{{ site.baseurl }}/fig/flink-application-sources-sinks.png" 
alt="Flink application with sources and sinks" class="offset" width="90%" />
+
+### Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one

Review comment:
       the new term for "subtask" is "task", but, according to the 
[glossary](https://ci.apache.org/projects/flink/flink-docs-master/concepts/glossary.html#sub-task),
 a "Sub-Task" is the same but "emphasizes that there are multiple parallel 
Tasks for the same Operator or Operator Chain". I think, we should use "Task" 
here (not sure about capitalization).

##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.
+
+<img src="{{ site.baseurl }}/fig/flink-application-sources-sinks.png" 
alt="Flink application with sources and sinks" class="offset" width="90%" />
+
+### Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.
+Different operators of the same program may have different levels of
+parallelism.
+
+<img src="{{ site.baseurl }}/fig/parallel_dataflow.svg" alt="A parallel 
dataflow" class="offset" width="80%" />
+
+Streams can transport data between two operators in a *one-to-one* (or
+*forwarding*) pattern, or in a *redistributing* pattern:
+
+  - **One-to-one** streams (for example between the *Source* and the *map()*
+    operators in the figure above) preserve the partitioning and ordering of
+    the elements. That means that subtask[1] of the *map()* operator will see
+    the same elements in the same order as they were produced by subtask[1] of
+    the *Source* operator.
+
+  - **Redistributing** streams (as between *map()* and *keyBy/window* above, as
+    well as between *keyBy/window* and *Sink*) change the partitioning of
+    streams. Each *operator subtask* sends data to different target subtasks,
+    depending on the selected transformation. Examples are *keyBy()* (which
+    re-partitions by hashing the key), *broadcast()*, or *rebalance()* (which
+    re-partitions randomly). In a *redistributing* exchange the ordering among
+    the elements is only preserved within each pair of sending and receiving
+    subtasks (for example, subtask[1] of *map()* and subtask[2] of
+    *keyBy/window*). So, for example, the redistribution between the 
keyBy/window and
+    the Sink operators shown above introduces non-determinism regarding the 
+    order in which the aggregated results for different keys arrive at the 
Sink.
+
+{% top %}
+
+## Timely Stream Processing
+
+For most streaming applications it is very valuable to be able re-process 
historic data with the
+same code that is used to process live data -- and to produce deterministic, 
consistent results,
+regardless.
+
+It can also be crucial to pay attention to the order in which events occurred, 
rather than the order
+in which they are delivered for processing, and to be able to reason about 
when a set of events is
+(or should be) complete. For example, consider the set of events involved in 
an e-commerce
+transaction, or financial trade.
+
+These requirements for timely stream processing can be met by using event time 
timestamps that are
+recorded in the data stream, rather than using the clocks of the machines 
processing the data.
+
+{% top %}
+
+## Stateful Stream Processing
+
+Flink's operations can be stateful. This means that how one event is handled 
can depend on the
+accumulated effect of all the events that came before it. State may be used 
for something simple,
+such as counting events per minute to display on a dashboard, or for something 
more complex, such as
+computing features for a fraud detection model.
+
+A Flink application is run in parallel on a distributed cluster. The various 
parallel instances of a
+given operator will execute independently, in separate threads, and in general 
will be running on
+different machines.
+
+The set of parallel instances of a stateful operator is effectively a sharded 
key-value store. Each
+parallel instance is responsible for handling events for a specific group of 
keys, and the state for
+those keys is kept locally.
+
+The diagram below shows a job running with a parallelism of two across the 
first three operators in
+the job graph, terminating in a sink that has a parallelism of one. The third 
operator is stateful,
+and you can see that a fully connected network shuffle is occurring between 
the second and third

Review comment:
       ?
   ```suggestion
   and you can see that a fully-connected network shuffle is occurring between 
the second and third
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] NicoK commented on a change in pull request #11826: [FLINK-17236][docs] Add Tutorials section overview

Reply via email to