This is an automated email from the ASF dual-hosted git repository.
altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 27b526f [BEAM-7389] Add code examples for FlatMap page
new 0c56384 Merge pull request #9260 from davidcavazos/flatmap-page
27b526f is described below
commit 27b526f6f4dba9c71e724f8a7766db6820610030
Author: David Cavazos <[email protected]>
AuthorDate: Fri Aug 2 09:55:54 2019 -0700
[BEAM-7389] Add code examples for FlatMap page
---
.../transforms/python/element-wise/filter.md | 2 +-
.../transforms/python/element-wise/flatmap.md | 279 ++++++++++++++++++++-
2 files changed, 272 insertions(+), 9 deletions(-)
diff --git a/website/src/documentation/transforms/python/element-wise/filter.md
b/website/src/documentation/transforms/python/element-wise/filter.md
index 420718c..84e00eb 100644
--- a/website/src/documentation/transforms/python/element-wise/filter.md
+++ b/website/src/documentation/transforms/python/element-wise/filter.md
@@ -42,7 +42,7 @@ on the comparison ordering of the element.
## Examples
-In the following examples, we create a pipeline with a `PCollection` of
produce their icon, name, and duration.
+In the following examples, we create a pipeline with a `PCollection` of
produce with their icon, name, and duration.
Then, we apply `Filter` in multiple ways to filter out produce by their
duration value.
### Example 1: Filtering with a function
diff --git
a/website/src/documentation/transforms/python/element-wise/flatmap.md
b/website/src/documentation/transforms/python/element-wise/flatmap.md
index 4474990..81772ee 100644
--- a/website/src/documentation/transforms/python/element-wise/flatmap.md
+++ b/website/src/documentation/transforms/python/element-wise/flatmap.md
@@ -19,24 +19,287 @@ limitations under the License.
-->
# FlatMap
-<table align="left">
- <a target="_blank" class="button"
+
+<script type="text/javascript">
+localStorage.setItem('language', 'language-py')
+</script>
+
+<table>
+ <td>
+ <a class="button" target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.FlatMap">
- <img src="https://beam.apache.org/images/logos/sdks/python.png"
width="20px" height="20px"
- alt="Pydoc" />
- Pydoc
+ <img src="https://beam.apache.org/images/logos/sdks/python.png"
+ width="20px" height="20px" alt="Pydoc" />
+ Pydoc
</a>
+ </td>
</table>
<br>
+
Applies a simple 1-to-many mapping function over each element in the
collection.
The many elements are flattened into the resulting collection.
## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates.
-## Related transforms
+In the following examples, we create a pipeline with a `PCollection` of
produce with their icon, name, and duration.
+Then, we apply `FlatMap` in multiple ways to yield zero or more elements per
each input element into the resulting `PCollection`.
+
+`FlatMap` accepts a function that returns an `iterable`,
+where each of the output `iterable`'s elements is an element of the resulting
`PCollection`.
+
+### Example 1: FlatMap with a predefined function
+
+We use the function `str.split` which takes a single `str` element and outputs
a `list` of `str`s.
+This pipeline splits the input element using whitespaces, creating a list of
zero or more elements.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_simple %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 2: FlatMap with a function
+
+We define a function `split_words` which splits an input `str` element using
the delimiter `','` and outputs a `list` of `str`s.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_function %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 3: FlatMap with a lambda function
+
+For this example, we want to flatten a `PCollection` of lists of `str`s into a
`PCollection` of `str`s.
+Each input element is already an `iterable`, where each element is what we
want in the resulting `PCollection`.
+We use a lambda function that returns the same input element it received.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_lambda %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 4: FlatMap with a generator
+
+For this example, we want to flatten a `PCollection` of lists of `str`s into a
`PCollection` of `str`s.
+We use a generator to iterate over the input list and yield each of the
elements.
+Each yielded result in the generator is an element in the resulting
`PCollection`.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_generator %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 5: FlatMapTuple for key-value pairs
+
+If your `PCollection` consists of `(key, value)` pairs,
+you can use `FlatMapTuple` to unpack them into different function arguments.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_tuple %}```
+
+Output `PCollection` after `FlatMapTuple`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 6: FlatMap with multiple arguments
+
+You can pass functions with multiple arguments to `FlatMap`.
+They are passed as additional positional arguments or keyword arguments to the
function.
+
+In this example, `split_words` takes `text` and `delimiter` as arguments.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_multiple_arguments %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 7: FlatMap with side inputs as singletons
+
+If the `PCollection` has a single value, such as the average from another
computation,
+passing the `PCollection` as a *singleton* accesses that value.
+
+In this example, we pass a `PCollection` the value `','` as a singleton.
+We then use that value as the delimiter for the `str.split` method.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_side_inputs_singleton %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+### Example 8: FlatMap with side inputs as iterators
+
+If the `PCollection` has multiple values, pass the `PCollection` as an
*iterator*.
+This accesses elements lazily as they are needed,
+so it is possible to iterate over large `PCollection`s that won't fit into
memory.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_side_inputs_iter %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:valid_plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+> **Note**: You can pass the `PCollection` as a *list* with
`beam.pvalue.AsList(pcollection)`,
+> but this requires that all the elements fit into memory.
+
+### Example 9: FlatMap with side inputs as dictionaries
+
+If a `PCollection` is small enough to fit into memory, then that `PCollection`
can be passed as a *dictionary*.
+Each element must be a `(key, value)` pair.
+Note that all the elements of the `PCollection` must fit into memory for this.
+If the `PCollection` won't fit into memory, use
`beam.pvalue.AsIter(pcollection)` instead.
+
+```py
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py
tag:flat_map_side_inputs_dict %}```
+
+Output `PCollection` after `FlatMap`:
+
+```
+{% github_sample
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map_test.py
tag:valid_plants %}```
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/flat_map.py">
+ <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png"
+ width="20px" height="20px" alt="View on GitHub" />
+ View on GitHub
+ </a>
+ </td>
+</table>
+<br>
+
+## Related transforms
+
* [Filter]({{ site.baseurl
}}/documentation/transforms/python/elementwise/filter) is useful if the
function is just
deciding whether to output an element or not.
* [ParDo]({{ site.baseurl
}}/documentation/transforms/python/elementwise/pardo) is the most general
element-wise mapping
operation, and includes other abilities such as multiple output collections
and side-inputs.
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map)
behaves the same, but produces exactly one output for each input.
\ No newline at end of file
+* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map)
behaves the same, but produces exactly one output for each input.
+
+<table>
+ <td>
+ <a class="button" target="_blank"
+
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.FlatMap">
+ <img src="https://beam.apache.org/images/logos/sdks/python.png"
+ width="20px" height="20px" alt="Pydoc" />
+ Pydoc
+ </a>
+ </td>
+</table>
+<br>