dawidwys commented on a change in pull request #19107:
URL: https://github.com/apache/flink/pull/19107#discussion_r827858704
##########
File path: docs/content/docs/concepts/glossary.md
##########
@@ -25,182 +25,605 @@ under the License.
# Glossary
+#### Aggregation
+
+Aggregation is an operation that takes multiple values and returns a single
value. When working with
+streams, it generally makes more sense to think in terms of aggregations over
finite windows, rather
+than over the entire stream.
+
+#### (Flink) Application
+
+A Flink application is any user program that submits one or multiple [Flink
Jobs](#flink-job) from its
+`main()` method. The execution of these jobs can happen in a local JVM or on a
remote setup of clusters
+with multiple machines.
+
+The jobs of an application can either be submitted to a long-running [Session
Cluster](#session-cluster),
+to a dedicated [Application Cluster](#application-cluster), or to a [Job
Cluster](#job-cluster).
+
+#### Application Cluster
+
+A Flink application cluster is a dedicated [Flink cluster](#(flink)-cluster)
that only executes
+[Flink jobs](#flink-job) from one [Flink application](#(flink)-application).
The lifetime of the Flink
+cluster is bound to the lifetime of the Flink application.
+
+#### Asynchronous Snapshotting
+
+A form of [snapshotting](#snapshot) that doesn't impede the ongoing stream
processing by allowing an
+operator to continue processing while it stores its state snapshot,
effectively letting the state
+snapshots happen asynchronously in the background.
+
+#### At-least-once
+
+A fault-tolerance guarantee and data delivery approach where multiple attempts
are made at delivering
+an event such that at least one succeeds. This guarantees that nothing is
lost, but you may experience
+duplicated results.
+
+#### At-most-once
+
+A data delivery approach where each event is delivered zero or one times.
There is lower latency but
+events may be lost.
+
+#### Backpressure
+
+A situation where a system is receiving data at a higher rate than it can
process during a temporary
+load spike.
+
+#### Barrier Alignment
+
+For providing exactly-once guarantees, Flink aligns the streams at operators
that receive multiple
+input streams, so that the snapshot will reflect the state resulting from
consuming events from both
+input streams up to (but not past) both barriers.
+
+#### Batch Processing
+
+This is the processing and analysis on a set of data that have already been
stored over a period
+of time (i.e. in groups or batches). The results are usually not available in
real-time. Flink
+executes batch programs as a special case of streaming programs.
+
+#### Bounded Streams
+
+Bounded [DataStreams](#datastream) have a defined start and end. They can be
processed by ingesting
+all data before performing any computations. Ordered ingestion is not required
to process bounded streams
+because a bounded data set can always be sorted. Processing of bounded streams
is also known as
+[batch processing](#batch-processing).
+
+#### Checkpoint
+
+A [snapshot](#snapshot) taken automatically by Flink for the purpose of being
able to recover from
+faults. A checkpoint marks a specific point in each of the input streams along
with the corresponding
+state for each of the operators. Checkpoints can be incremental and unaligned,
and are optimized for
+being restored quickly.
+
+#### Checkpoint Barrier
+
+A special marker that flows along the graph and triggers the checkpointing
process on each of the
+parallel instances of the operators. Checkpoint barriers are injected into the
source operators and
+flow together with the data. If an operator has multiple outputs, it gets
"split" into both of them.
+
+#### Checkpoint Coordinator
+
+This coordinates the distributed snapshots of operators and state. It is part
of the JobManager and
+instructs the TaskManager when to begin a checkpoint by sending the messages
to the relevant tasks
+and collecting the checkpoint acknowledgements.
+
#### Checkpoint Storage
-The location where the [State Backend](#state-backend) will store its snapshot
during a checkpoint (Java Heap of [JobManager](#flink-jobmanager) or
Filesystem).
+The location where the [state backend](#state-backend) will store its snapshot
during a checkpoint.
+This could be on the Java heap of the [JobManager](#flink-jobmanager) or on a
file system.
+
+#### (Flink) Client
+
+This is not part of the runtime and program execution but is used to prepare
and send a dataflow graph
+to the JobManager. The Flink client runs either as part of the program that
triggers the execution or
+in the command line process via `./bin/flink run`.
+
+#### (Flink) Cluster
-#### Flink Application Cluster
+A distributed system consisting of (typically) one [JobManager](#jobmanager)
and one or more
+[TaskManager](#taskmanager) processes.
-A Flink Application Cluster is a dedicated [Flink Cluster](#flink-cluster) that
-only executes [Flink Jobs](#flink-job) from one [Flink
-Application](#flink-application). The lifetime of the [Flink
-Cluster](#flink-cluster) is bound to the lifetime of the Flink Application.
+#### Connected Streams
-#### Flink Job Cluster
+A pattern in Flink where a single operator has two input streams. Connected
streams can also be used
+to implement streaming joins.
-A Flink Job Cluster is a dedicated [Flink Cluster](#flink-cluster) that only
-executes a single [Flink Job](#flink-job). The lifetime of the
-[Flink Cluster](#flink-cluster) is bound to the lifetime of the Flink Job.
-This deployment mode has been deprecated since Flink 1.15.
+#### Connectors
-#### Flink Cluster
+Connectors allow [Flink applications](#(flink)-applications) to read from and
write to various external
+systems. They support multiple formats in order to encode and decode data to
match Flinkās data structures.
-A distributed system consisting of (typically) one
[JobManager](#flink-jobmanager) and one or more
-[Flink TaskManager](#flink-taskmanager) processes.
+#### Dataflow
+
+See [logical graph](#logical-graph).
+
+#### DataStream
+
+This is a collection of data in a Flink application. You can think of them as
immutable collections
+of data that can contain duplicates. This data can either be finite or
unbounded.
+
+#### Directed Acyclic Graph (DAG)
+
+This is a graph that is directed and without cycles connecting the other
edges. It can be used to
+conceptually represent a [dataflow](#dataflow) where you never look back to
previous events.
+
+#### Dispatcher
+
+This is a component of the [JobManager](#jobmanager) and provides a REST
interface to submit Flink
+applications for execution and starts a new [JobMaster](#jobmaster) for each
submitted job. It also
+runs the Flink web UI to provide information about job executions.
#### Event
-An event is a statement about a change of the state of the domain modelled by
the
-application. Events can be input and/or output of a stream or batch processing
application.
-Events are special types of [records](#Record).
+An event is a statement about a change of the state of the domain modelled by
the application. Events
+can be input and/or output of a stream processing application. Events are
special types of
+[records](#Record).
+
+#### Event Time
+
+The time when an [event](#event) occurred, as recorded by the device producing
(or storing) the event.
+For reproducible results, you should use event time because the result does
not depend on when the
+calculation is performed.
+
+If you want to use event time, you will also need to supply a Timestamp
Extractor and Watermark Generator
+that Flink will use to track the progress of event time.
+
+#### Exactly-once
+
+A fault-tolerance guarantee and data delivery approach where nothing is lost
or duplicated. This does
+not mean that every event will be processed exactly once. Instead, it means
that every event will affect
+the state being managed by Flink exactly once.
#### ExecutionGraph
-see [Physical Graph](#physical-graph)
+See [Physical Graph](#physical-graph).
+
+#### Externalized Checkpoint
Review comment:
I agree "Retained checkpoint" is a better name.
For the other part. No, checkpoints are not relocatable and they are not
self contained, especially incremental checkpoints. You can assume that only
for savepoints.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]