lukecwik commented on a change in pull request #13160:
URL: https://github.com/apache/beam/pull/13160#discussion_r510397882
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5143,3 +5143,282 @@ perUser.apply(ParDo.of(new DoFn<KV<String, ValueT>,
OutputT>() {
}
}));
{{< /highlight >}}
+
+## 12 Splittable DoFns {#splittable-dofns}
+
+Splittable DoFns (SDFs) enable users to create modular components containing
I/Os (and some advanced
+[non I/O use
cases](https://s.apache.org/splittable-do-fn#heading=h.5cep9s8k4fxv)). Having
modular
+I/O components that can be connected to each other simplify typical patterns
that users want.
+For example, a popular use case is to read filenames from a message queue
followed by parsing those
+files. Traditionally users were required to either write a single I/O
connector that contained the
+logic for the message queue and the file reader (increased complexity) or
choose to reuse a message
+queue I/O followed by a regular DoFn that read the file (decreased
performance). With splittable DoFns,
+we bring the richness of Apache Beam’s I/O APIs to DoFns enabling modularity
while maintaining the
+performance of traditional I/O connectors.
+
+### 12.1 Splittable DoFn basics {#splittable-dofn-basics}
+
+At a high level, a splittable DoFn is responsible for processing element and
restriction pairs. A
+restriction represents a subset of work that would have been necessary to have
been done when
+processing the element.
+
+Executing a splittable DoFn follows the following steps:
+
+1. Each element is paired with a restriction (e.g. filename is paired with
offset range representing the whole file).
+2. Each element and restriction pair is split (e.g. offset ranges are broken
up into smaller pieces).
+3. The runner redistributes the element and restriction pairs to several
workers.
+4. Element and restriction pairs are processed in parallel (e.g. the file is
read).
Review comment:
Below the diagram I have the explanation for the checkpoint/split. I
moved it to be part of the list.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]