[
https://issues.apache.org/jira/browse/BEAM-11759?focusedWorklogId=592828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592828
]
ASF GitHub Bot logged work on BEAM-11759:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/May/21 16:20
Start Date: 04/May/21 16:20
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#14717:
URL: https://github.com/apache/beam/pull/14717#discussion_r625922517
##########
File path: website/www/site/content/en/documentation/glossary.md
##########
@@ -0,0 +1,464 @@
+---
+title: "Beam glossary"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Apache Beam glossary
+
+## Aggregation
+
+A transform pattern for computing a value from multiple input elements.
Aggregation is similar to the reduce operation in the
[MapReduce](https://en.wikipedia.org/wiki/MapReduce) model. Aggregation
transforms include Count (computes the count of all elements in the
aggregation), Max (computes the maximum element in the aggregation), and Sum
(computes the sum of all elements in the aggregation).
+
+For a complete list of aggregation transforms, see:
+
+* [Java Transform catalog](/documentation/transforms/java/overview/)
+* [Python Transform catalog](/documentation/transforms/python/overview/)
+
+## Apply
+
+A method for invoking a transform on a PCollection. Each transform in the Beam
SDKs has a generic `apply` method (or pipe operator `|`). Invoking multiple
Beam transforms is similar to method chaining, but with a difference: You apply
the transform to the input PCollection, passing the transform itself as an
argument, and the operation returns the output PCollection. Because of Beam’s
deferred execution model, applying a transform does not immediately execute
that transform.
+
+To learn more, see:
+
+* [Applying transforms](/documentation/programming-guide/#applying-transforms)
+
+## Batch processing
+
+A data processing paradigm for working with finite, or bounded, datasets. A
bounded PCollection represents a dataset of a known, fixed size. Reading from a
batch data source, such as a file or a database, creates a bounded PCollection.
A batch processing job eventually ends, in contrast to a streaming job, which
runs until cancelled.
+
+To learn more, see:
+
+* [Size and
boundedness](/documentation/programming-guide/#size-and-boundedness)
+
+## Bounded data
+
+A dataset of a known, fixed size. A PCollection can be bounded or unbounded,
depending on the source of the data that it represents. Reading from a batch
data source, such as a file or a database, creates a bounded PCollection. Beam
also supports reading a bounded amount of data from an unbounded source.
+
+To learn more, see:
+
+* [Size and
boundedness](/documentation/programming-guide/#size-and-boundedness)
+
+## Bundle
+
+The processing unit for elements in a PCollection. Instead of processing all
elements in a PCollection simultaneously, Beam processes the elements in
bundles. The runner handles the division of the collection into bundles, and in
doing so it may optimize the bundle size for the use case. For example, a
streaming runner might process smaller bundles than a batch runner.
+
+To learn more, see:
+
+* [Bundling and
persistence](/documentation/runtime/model/#bundling-and-persistence)
+
+## Coder
+
+A component that describes how the elements of a PCollection can be encoded
and decoded. To support distributed processing and cross-language portability,
Beam needs to be able to encode each element of a PCollection as bytes. The
Beam SDKs provide built-in coders for common types and language-specific
mechanisms for specifying the encoding of a PCollection.
+
+To learn more, see:
+
+* [Data encoding and type
safety](/documentation/programming-guide/#data-encoding-and-type-safety)
+
+## CoGroupByKey
+
+A PTransform that takes two or more PCollections and aggregates the elements
by key. In effect, CoGroupByKey performs a relational join of two or more
key/value PCollections that have the same key type. While GroupByKey performs
this operation over a single input collection, CoGroupByKey operates over
multiple input collections.
+
+To learn more, see:
+
+* [CoGroupByKey](/documentation/programming-guide/#cogroupbykey)
+* [CoGroupByKey
(Java)](/documentation/transforms/java/aggregation/cogroupbykey/)
+* [CoGroupByKey
(Python)](/documentation/transforms/python/aggregation/cogroupbykey/)
+
+## Collection
+
+See [PCollection](/documentation/glossary/#pcollection).
+
+## Combine
+
+A PTransform for combining all elements of a PCollection or all values
associated with a key. When you apply a Combine transform, you have to provide
a user-defined function (UDF) that contains the logic for combining the
elements or values. The combining function should be
[commutative](https://en.wikipedia.org/wiki/Commutative_property) and
[associative](https://en.wikipedia.org/wiki/Associative_property), because the
function is not necessarily invoked exactly once on all values with a given key.
+
+To learn more, see:
+
+* [Combine](/documentation/programming-guide/#combine)
+* [Combine (Java)](/documentation/transforms/java/aggregation/combine/)
+* [CombineGlobally
(Python)](/documentation/transforms/python/aggregation/combineglobally/)
+* [CombinePerKey
(Python)](/documentation/transforms/python/aggregation/combineperkey/)
+* [CombineValues
(Python)](/documentation/transforms/python/aggregation/combinevalues/)
+
+## Composite transform
+
+A PTransform that expands into many PTransforms. Composite transforms have a
nested structure, in which a complex transform applies one or more simpler
transforms. These simpler transforms could be existing Beam operations like
ParDo, Combine, or GroupByKey, or they could be other composite transforms.
Nesting multiple transforms inside a single composite transform can make your
pipeline more modular and easier to understand.
+
+To learn more, see:
+
+* [Composite
transforms](/documentation/programming-guide/#composite-transforms)
+
+## Counter (metric)
+
+A metric that reports a single long value and can be incremented. In the Beam
model, metrics provide insight into the state of a pipeline, potentially while
the pipeline is running.
+
+To learn more, see:
+
+* [Types of metrics](/documentation/programming-guide/#types-of-metrics)
+
+## Cross-language transforms
+
+Transforms that can be shared across Beam SDKs. With cross-language
transforms, you can use transforms written in any supported SDK language
(currently, Java and Python) in a pipeline written in a different SDK language.
For example, you could use the Apache Kafka connector from the Java SDK in a
Python streaming pipeline. Cross-language transforms make it possible to
provide new functionality simultaneously in different SDKs.
+
+To learn more, see:
+
+* [Multi-language
pipelines](/documentation/programming-guide/#mulit-language-pipelines)
+
+## Deferred execution
+
+A feature of the Beam execution model. Beam operations are deferred, meaning
that the result of a given operation may not be available for control flow.
Deferred execution allows the Beam API to support parallel processing of data.
+
+## Distribution (metric)
+
+A metric that reports information about the distribution of reported values.
In the Beam model, metrics provide insight into the state of a pipeline,
potentially while the pipeline is running.
+
+To learn more, see:
+
+* [Types of metrics](/documentation/programming-guide/#types-of-metrics)
+
+## DoFn
+
+A function object used by ParDo (or some other transform) to process the
elements of a PCollection. A DoFn is a user-defined function, meaning that it
contains custom code that defines a data processing task in your pipeline. The
Beam system invokes a DoFn one or more times to process some arbitrary bundle
of elements, but Beam doesn’t guarantee an exact number of invocations.
+
+To learn more, see:
+
+* [ParDo](/documentation/programming-guide/#pardo)
+
+## Driver
+
+A program that defines your pipeline, including all of the inputs, transforms,
and outputs. To use Beam, you need to create a driver program using classes
from one of the Beam SDKs. The driver program creates a pipeline and specifies
the execution options that tell the pipeline where and how to run. These
options include the runner, which determines what backend your pipeline will
run on.
+
+To learn more, see:
+
+* [Overview](/documentation/programming-guide/#overview)
+
+## Element
+
+The unit of data in a PCollection. Elements in a PCollection can be of any
type, but they must all have the same type. This allows parallel computations
to operate uniformly across the entire collection. Some element types have a
structure that can be introspected (for example, JSON, Protocol Buffer, Avro,
and database records).
+
+To learn more, see:
+
+* [PCollection
characteristics](/documentation/programming-guide/#pcollection-characteristics)
+
+## Element-wise
+
+A type of transform that independently processes each element in an input
PCollection. An element-wise transform might output 0, 1, or multiple values
for each input element. This is in contrast to aggregation transforms, which
compute a single value from multiple input elements. Element-wise operations
include Filter, FlatMap, and ParDo.
+
+For a complete list of element-wise transforms, see:
+
+* [Java Transform catalog](/documentation/transforms/java/overview/)
+* [Python Transform catalog](/documentation/transforms/python/overview/)
Review comment:
nit:
```suggestion
* [Java Transform
catalog](/documentation/transforms/java/overview/#element-wise)
* [Python Transform
catalog](/documentation/transforms/python/overview/#element-wise)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 592828)
Time Spent: 20m (was: 10m)
> Create glossary in the Beam documentation
> -----------------------------------------
>
> Key: BEAM-11759
> URL: https://issues.apache.org/jira/browse/BEAM-11759
> Project: Beam
> Issue Type: New Feature
> Components: website
> Reporter: David Huntsperger
> Assignee: David Huntsperger
> Priority: P2
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Create a glossary to help new users understand Beam terminology.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)