[
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=293357&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-293357
]
ASF GitHub Bot logged work on BEAM-7389:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Aug/19 20:20
Start Date: 12/Aug/19 20:20
Worklog Time Spent: 10m
Work Description: davidcavazos commented on pull request #9257:
[BEAM-7389] Add DoFn methods sample
URL: https://github.com/apache/beam/pull/9257#discussion_r313109881
##########
File path:
sdks/python/apache_beam/examples/snippets/transforms/element_wise/pardo.py
##########
@@ -81,3 +82,44 @@ def process(self, elem, timestamp=beam.DoFn.TimestampParam,
window=beam.DoFn.Win
# pylint: enable=line-too-long
if test:
test(dofn_params)
+
+
+def pardo_dofn_methods(test=None):
+ # [START pardo_dofn_methods]
+ import apache_beam as beam
+
+ class DoFnMethods(beam.DoFn):
+ def __init__(self):
+ print('__init__')
+ self.window = beam.window.GlobalWindow()
+
+ def setup(self):
+ print('setup')
Review comment:
I tried doing that here, but the code sample ended up looking a lot more
cluttered and intimidating. I think it looked better in the docs themselves.
Here's an extract of what I was adding in the docs:
A
[`DoFn`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn)
can be customized with a number of methods that can help create more complex
behaviors. You can customize what a worker will do when it starts and shuts
down with `setup` and `teardown`. You can also customize what to do when a
[*bundle of
elements*](https://beam.apache.org/documentation/execution-model/#bundling-and-persistence)
starts and when a bundle finishes with `start_bundle` and `finish_bundle`.
*
[`DoFn.setup()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.setup):
Called *once per worker* when the worker is starting to run. This is a good
place to connect to database instances, open network connections or other
resources.
*
[`DoFn.start_bundle()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.start_bundle):
Called *once per bundle of elements* before calling `process` on the first
element of the bundle. This is a good place to start keeping track of the
bundle elements.
* [**`DoFn.process(element, *args,
**kwargs)`**](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.process)
*[required]:* Called *once per element*, can *yield zero or more elements*.
*
[`DoFn.finish_bundle()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.finish_bundle):
Called *once per bundle of elements* after calling `process` after the last
element of the bundle, can *yield zero or more elements*. This is a good place
to do batch calls on a bundle of elements, such as running a database query.
For example, you can initialize a batch in `start_bundle`, add elements to the
batch in `process` instead of yielding them, then running a batch query on
those elements on `finish_bundle`, and yielding all the results.
Note that yielded elements from `finish_bundle` must be of the type
`apache_beam.utils.windowed_value.WindowedValue`. You will need to provide a
timestamp as a unix timestamp, which you can get from the last processed
element. You will also need to provide a window, which you can get from the
last processed element like in the example below.
*
[`DoFn.teardown()`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.teardown):
Called *once per worker* when the worker is shutting down. This is a good
place to close database instances, close network connections or other resources.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 293357)
Time Spent: 39.5h (was: 39h 20m)
> Colab examples for element-wise transforms (Python)
> ---------------------------------------------------
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
> Issue Type: Improvement
> Components: website
> Reporter: Rose Nguyen
> Assignee: David Cavazos
> Priority: Minor
> Time Spent: 39.5h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)