twang126 commented on code in PR #26451:
URL: https://github.com/apache/beam/pull/26451#discussion_r1180643654
##########
sdks/python/apache_beam/yaml/README.md:
##########
@@ -215,6 +215,157 @@ pipeline:
path: /path/to/big.csv
```
+## Windowing
+
+This API can be used to define both streaming and batch pipelines.
+In order to meaningfully aggregate elements in a streaming pipeline,
+some kind of windowing is typically required. Beam's
+[windowing](https://beam.apache.org/documentation/programming-guide/#windowing)
+and
[triggering](https://beam.apache.org/documentation/programming-guide/#triggers)
+can be be declared using the same WindowInto transform available in all other
+SDKs.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ - type: WindowInto
+ windowing:
+ type: fixed
+ size: 60
+ - type: SomeAggregation
+ - type: WriteToPubSub
+ topic: anotherPubSubTopic
+```
+
+Rather than using an explicit `WindowInto` operation, one may instead tag a
+transform itself with a specified windowing which will cause its inputs
+(and hence the transform itself) to be applied with that windowing.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ - type: SomeAggregation
+ windowing:
+ type: sliding
+ size: 60
+ period: 10
+ - type: WriteToPubSub
+ topic: anotherPubSubTopic
+```
+
+Note that the `Sql` operation itself often a from of aggregation, and applying
+a windowing which will cause all grouping to be done per window.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ - type: Sql
+ query: "select col1, count(*) as c from PCOLLECTION"
+ windowing:
+ type: sessions
+ gap: 60
+ - type: WriteToPubSub
+ topic: anotherPubSubTopic
+```
+
+The specified windowing is applied to all inputs, in this case resulting in
+a join per window.
+
+```
+pipeline:
+ - type: ReadFromPubSub
+ name: ReadLeft
+ topic: leftTopic
+
+ - type: ReadFromPubSub
+ name: ReadRight
+ topic: rightTopic
+
+ - type: Sql
+ query: select left.col1, right.col2 from left join right using (col3)
+ input:
+ left: ReadLeft
+ right: ReadRight
+ windowing:
+ type: fixed
+ size: 60
+```
+
+For a transform with no inputs, the specified windowing is instead applied to
+its output(s). As per the Beam model, the windowing is then inherited by all
+consuming operations. This is especially useful for root operations like Read.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ windowing:
+ type: fixed
+ size: 60
Review Comment:
I'm curious how you are planning on supporting triggers? Will the trigger
definitions just be listed here as properties of each window?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]