[
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=458774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458774
]
ASF GitHub Bot logged work on BEAM-7390:
----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Jul/20 16:39
Start Date: 14/Jul/20 16:39
Worklog Time Spent: 10m
Work Description: davidcavazos commented on a change in pull request
#12252:
URL: https://github.com/apache/beam/pull/12252#discussion_r454491828
##########
File path:
website/www/site/content/en/documentation/transforms/python/aggregation/combineglobally.md
##########
@@ -14,29 +14,197 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
+
# CombineGlobally
-<table align="left">
- <a target="_blank" class="button"
-
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineGlobally">
- <img src="https://beam.apache.org/images/logos/sdks/python.png"
width="20px" height="20px"
- alt="Pydoc" />
- Pydoc
- </a>
-</table>
-<br><br>
+{{< localstorage language language-py >}}
+{{< button-pydoc path="apache_beam.transforms.core" class="CombineGlobally" >}}
Combines all elements in a collection.
See more information in the [Beam Programming
Guide](/documentation/programming-guide/#combine).
## Examples
-See [BEAM-7390](https://issues.apache.org/jira/browse/BEAM-7390) for updates.
-## Related transforms
+In the following examples, we create a pipeline with a `PCollection` of
produce.
+Then, we apply `CombineGlobally` in multiple ways to combine all the elements
in the `PCollection`.
+
+`CombineGlobally` accepts a function that takes a list of elements as an
input, and combines them to return a single element.
+
+### Example 1: Combining with a function
+
+We define a function `get_common_items` which takes a list of sets as an
input, and calculates the intersection (common items) of those sets.
+
+{{< highlight py >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
combineglobally_function >}}
+{{< /highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output `PCollection` after `CombineGlobally`:
+{{< /paragraph >}}
+
+{{< highlight class="notebook-skip" >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py"
common_items >}}
+{{< /highlight >}}
+
+{{< buttons-code-snippet
+
py="sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
>}}
+
+### Example 2: Combining with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+{{< highlight py >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
combineglobally_lambda >}}
+{{< /highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output `PCollection` after `CombineGlobally`:
+{{< /paragraph >}}
+
+{{< highlight class="notebook-skip" >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py"
common_items >}}
+{{< /highlight >}}
+
+{{< buttons-code-snippet
+
py="sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
>}}
+
+### Example 3: Combining with multiple arguments
+
+You can pass functions with multiple arguments to `CombineGlobally`.
+They are passed as additional positional arguments or keyword arguments to the
function.
+
+In this example, the lambda function takes `sets` and `exclude` as arguments.
+
+{{< highlight py >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
combineglobally_multiple_arguments >}}
+{{< /highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output `PCollection` after `CombineGlobally`:
+{{< /paragraph >}}
+
+{{< highlight class="notebook-skip" >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py"
common_items_with_exceptions >}}
+{{< /highlight >}}
+
+{{< buttons-code-snippet
+
py="sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
>}}
+
+### Example 4: Combining with side inputs as singletons
+
+If the `PCollection` has a single value, such as the average from another
computation,
+passing the `PCollection` as a *singleton* accesses that value.
+
+In this example, we pass a `PCollection` the value `'🥕'` as a singleton.
+We then use that value to exclude specific items.
+
+{{< highlight py >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
combineglobally_side_inputs_singleton >}}
+{{< /highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output `PCollection` after `CombineGlobally`:
+{{< /paragraph >}}
+
+{{< highlight class="notebook-skip" >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py"
common_items_with_exceptions >}}
+{{< /highlight >}}
+
+{{< buttons-code-snippet
+
py="sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
>}}
+
+### Example 5: Combining with side inputs as iterators
+
+If the `PCollection` has multiple values, pass the `PCollection` as an
*iterator*.
+This accesses elements lazily as they are needed,
+so it is possible to iterate over large `PCollection`s that won't fit into
memory.
+
+{{< highlight py >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
combineglobally_side_inputs_iter >}}
+{{< /highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output `PCollection` after `CombineGlobally`:
+{{< /paragraph >}}
+
+{{< highlight class="notebook-skip" >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py"
common_items_with_exceptions >}}
+{{< /highlight >}}
+
+{{< buttons-code-snippet
+
py="sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
>}}
+
+> **Note**: You can pass the `PCollection` as a *list* with
`beam.pvalue.AsList(pcollection)`,
+> but this requires that all the elements fit into memory.
+
+### Example 6: Combining with side inputs as dictionaries
+
+If a `PCollection` is small enough to fit into memory, then that `PCollection`
can be passed as a *dictionary*.
+Each element must be a `(key, value)` pair.
+Note that all the elements of the `PCollection` must fit into memory for this.
+If the `PCollection` won't fit into memory, use
`beam.pvalue.AsIter(pcollection)` instead.
+
+{{< highlight py >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
combineglobally_side_inputs_dict >}}
+{{< /highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output `PCollection` after `CombineGlobally`:
+{{< /paragraph >}}
+
+{{< highlight class="notebook-skip" >}}
+{{< code_sample
"sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally_test.py"
custom_common_items >}}
+{{< /highlight >}}
+
+{{< buttons-code-snippet
+
py="sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py"
>}}
+
+### Example 7: Combining with a `CombineFn`
+
+The more general way to combine elements, and the most flexible, is with a
class that inherits from `CombineFn`.
+
+*
[`CombineFn.create_accumulator()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.create_accumulator):
+ Called *once per `CombineFn` instance* when the `CombineFn` instance is
initialized.
+ This creates an empty accumulator.
+ For example, an empty accumulator for a sum would be `0`, while an empty
accumulator for a product (multiplication) would be `1`.
+
+*
[`CombineFn.add_input()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.add_input):
+ Called *once per element*.
+ Takes an accumulator and an input element, combines them and returns the
updated accumulator.
+
+*
[`CombineFn.merge_accumulators()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.merge_accumulators):
+ Called *once per bundle of elements* after processing the last element of
the bundle.
Review comment:
@aaltay can you confirm if this and the rest of the descriptions are
correct? Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 458774)
Time Spent: 15h 10m (was: 15h)
> Colab examples for aggregation transforms (Python)
> --------------------------------------------------
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
> Issue Type: Improvement
> Components: website
> Reporter: Rose Nguyen
> Priority: P3
> Time Spent: 15h 10m
> Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog
--
This message was sent by Atlassian Jira
(v8.3.4#803005)