This is an automated email from the ASF dual-hosted git repository.

mbae pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 251dd0c  [BEAM-11545] State & timer for batched RPC calls pattern 
(#13643)
251dd0c is described below

commit 251dd0c0f0eccf83ed7346e385981180d562e1b1
Author: Matthias Baetens <[email protected]>
AuthorDate: Thu Dec 16 20:08:53 2021 +0100

    [BEAM-11545] State & timer for batched RPC calls pattern (#13643)
    
    * [BEAM-11545] State & timer for batched RPC calls pattern
---
 ...lements-for-efficient-external-service-calls.md | 56 ++++++++++++++++++++++
 .../content/en/documentation/patterns/overview.md  |  3 ++
 .../partials/section-menu/en/documentation.html    |  1 +
 3 files changed, 60 insertions(+)

diff --git 
a/website/www/site/content/en/documentation/patterns/grouping-elements-for-efficient-external-service-calls.md
 
b/website/www/site/content/en/documentation/patterns/grouping-elements-for-efficient-external-service-calls.md
new file mode 100644
index 0000000..9409e83
--- /dev/null
+++ 
b/website/www/site/content/en/documentation/patterns/grouping-elements-for-efficient-external-service-calls.md
@@ -0,0 +1,56 @@
+---
+title: "Pattern for grouping elements for efficient external service calls"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Grouping elements for efficient external service calls using the 
`GroupIntoBatches`-transform
+
+{{< language-switcher java py >}}
+
+Usually, authoring an Apache Beam pipeline can be done with out-of-the-box 
tools and transforms like _ParDo_'s, _Window_'s and _GroupByKey_'s. However, 
when you want more tight control, you can keep state in an otherwise stateless 
_DoFn_.
+
+State is kept on a per-key and per-windows basis, and as such, the input to 
your stateful DoFn needs to be keyed (e.g. by the customer identifier if you're 
tracking clicks from an e-commerce website).
+
+Examples of use cases are: assigning a unique ID to each element, joining 
streams of data in 'more exotic' ways, or batching up API calls to external 
services. In this section we'll go over the last one in particular.
+
+Make sure to check the 
[docs](https://beam.apache.org/documentation/programming-guide/#state-and-timers)
 for deeper understanding on state and timers.
+
+The `GroupIntoBatches`-transform uses state and timers under the hood to allow 
the user to exercise tight control over the following parameters:
+
+- `maxBufferDuration`: limits the amount of waitingtime for a batch to be 
emitted.
+- `batchSize`: limits the number of elements in one batch.
+- `batchSizeBytes`: (in Java only) limits the bytesize of one batch (using 
input coder to determine elementsize).
+- `elementByteSize`: (in Java only) limits the bytesize of one batch (using a 
user defined function to determine elementsize).
+
+while abstracting away the implementation details from users.
+
+The `withShardedKey()` functionality increases parallellism by spreading one 
key over multiple threads.
+
+The transforms are used in the following way in Java & Python:
+
+{{< highlight java >}}
+input.apply(
+          "Batch Contents",
+          GroupIntoBatches.<String, GenericJson>ofSize(batchSize)
+              .withMaxBufferingDuration(maxBufferingDuration)
+              .withShardedKey())
+{{< /highlight >}}
+
+{{< highlight py >}}
+input | GroupIntoBatches.WithShardedKey(batchSize, maxBufferingDuration)
+{{< /highlight >}}
+
+Applying these transforms will output groups of elements in a batch on a 
per-key basis, which you can then use to call an external API in bulk rather 
than on a per-element basis, resulting in a lower overhead in your pipeline.
diff --git a/website/www/site/content/en/documentation/patterns/overview.md 
b/website/www/site/content/en/documentation/patterns/overview.md
index b13c5d4..c5e6084 100644
--- a/website/www/site/content/en/documentation/patterns/overview.md
+++ b/website/www/site/content/en/documentation/patterns/overview.md
@@ -51,6 +51,9 @@ Pipeline patterns demonstrate common Beam use cases. Pipeline 
patterns are based
 **Cross-language patterns** - Patterns for creating cross-language pipelines
 * [Cross-language 
patterns](/documentation/patterns/cross-language/#cross-language-transforms)
 
+**State & timers patterns** - Patterns for using state & timers
+* [Grouping elements for efficient external service 
calls](/documentation/patterns/grouping-elements-for-efficient-external-service-calls/#grouping-elements-for-efficient-external-service-calls-using-the-`GroupIntoBatches`-transform)
+
 ## Contributing a pattern
 
 To contribute a new pipeline pattern, create an issue with the 
[`pipeline-patterns` 
label](https://issues.apache.org/jira/browse/BEAM-7449?jql=labels%20%3D%20pipeline-patterns)
 and add details to the issue description. See [Get started 
contributing](/contribute/) for more information.
diff --git 
a/website/www/site/layouts/partials/section-menu/en/documentation.html 
b/website/www/site/layouts/partials/section-menu/en/documentation.html
index 6d1e664..3f09b0e 100644
--- a/website/www/site/layouts/partials/section-menu/en/documentation.html
+++ b/website/www/site/layouts/partials/section-menu/en/documentation.html
@@ -206,6 +206,7 @@
     <li><a href="/documentation/patterns/schema/">Schema</a></li>
     <li><a href="/documentation/patterns/bqml/">BigQuery ML</a></li>
     <li><a href="/documentation/patterns/cross-language/">Cross-language 
transforms</a></li>
+    <li><a 
href="/documentation/patterns/grouping-elements-for-efficient-external-service-calls/">Grouping
 elements for efficient external service calls</a></li>
   </ul>
 </li>
 <li class="section-nav-item--collapsible">

Reply via email to