robertwb commented on code in PR #26451:
URL: https://github.com/apache/beam/pull/26451#discussion_r1180660999
##########
sdks/python/apache_beam/yaml/README.md:
##########
@@ -215,6 +215,157 @@ pipeline:
path: /path/to/big.csv
```
+## Windowing
+
+This API can be used to define both streaming and batch pipelines.
+In order to meaningfully aggregate elements in a streaming pipeline,
+some kind of windowing is typically required. Beam's
+[windowing](https://beam.apache.org/documentation/programming-guide/#windowing)
+and
[triggering](https://beam.apache.org/documentation/programming-guide/#triggers)
+can be be declared using the same WindowInto transform available in all other
+SDKs.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ - type: WindowInto
+ windowing:
+ type: fixed
+ size: 60
+ - type: SomeAggregation
+ - type: WriteToPubSub
+ topic: anotherPubSubTopic
+```
+
+Rather than using an explicit `WindowInto` operation, one may instead tag a
+transform itself with a specified windowing which will cause its inputs
+(and hence the transform itself) to be applied with that windowing.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ - type: SomeAggregation
+ windowing:
+ type: sliding
+ size: 60
+ period: 10
+ - type: WriteToPubSub
+ topic: anotherPubSubTopic
+```
+
+Note that the `Sql` operation itself often a from of aggregation, and applying
+a windowing which will cause all grouping to be done per window.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ - type: Sql
+ query: "select col1, count(*) as c from PCOLLECTION"
+ windowing:
+ type: sessions
+ gap: 60
+ - type: WriteToPubSub
+ topic: anotherPubSubTopic
+```
+
+The specified windowing is applied to all inputs, in this case resulting in
+a join per window.
+
+```
+pipeline:
+ - type: ReadFromPubSub
+ name: ReadLeft
+ topic: leftTopic
+
+ - type: ReadFromPubSub
+ name: ReadRight
+ topic: rightTopic
+
+ - type: Sql
+ query: select left.col1, right.col2 from left join right using (col3)
+ input:
+ left: ReadLeft
+ right: ReadRight
+ windowing:
+ type: fixed
+ size: 60
+```
+
+For a transform with no inputs, the specified windowing is instead applied to
+its output(s). As per the Beam model, the windowing is then inherited by all
+consuming operations. This is especially useful for root operations like Read.
+
+```
+pipeline:
+ type: chain
+ transforms:
+ - type: ReadFromPubSub
+ topic: myPubSubTopic
+ windowing:
+ type: fixed
+ size: 60
Review Comment:
TBD. They'll probably be properties here. (I thought about requiring an
extra level of nesting by putting this under `windowfn` but that felt
needlessly verbose.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]