robertwb commented on code in PR #30368:
URL: https://github.com/apache/beam/pull/30368#discussion_r1505182035
##########
website/www/site/content/en/documentation/sdks/yaml-udf.md:
##########
@@ -207,6 +207,73 @@ criteria. This can be accomplished with a `Filter`
transform, e.g.
keep: "col2 > 0"
```
+## Splitting
+
+It can also be useful to send different elements to different places
+(similar to what is done with side outputs in other SDKs).
+While this can be done with a set of `Filter` operations, if every
+element has a single destination it can be more natural to use a `Split`
+transform instead which send every element to a unique output.
+For example, this will send all elements where `col1` is equal to `"a"` to the
+output `Split.a`.
+
+```
+- type: Split
+ input: input
+ config:
+ destination: col1
+ outputs: ['a', 'b', 'c']
+
+- type: SomeTransform
+ input: Split.a
+ config:
+ param: ...
+
+- type: AnotherTransform
+ input: Split.b
+ config:
+ param: ...
+```
+
+One can also specify the destination as a function, e.g.
+
+```
+- type: Split
+ input: input
+ config:
+ language: python
+ destination: "'even' if col2 % 2 == 0 else 'odd'"
+ outputs: ['even', 'odd']
+```
+
+One can optionally provide a catch-all output which will capture all elements
+that are not in the named outputs (which would otherwise be an error):
+
+```
+- type: Split
+ input: input
+ config:
+ destination: col1
+ outputs: ['a', 'b', 'c']
+ unknown_output: 'other'
+```
+
+To send elements to multiple (or no) outputs, one could use an iterable column
+and proceed the `Split` with an `Explode`.
Review Comment:
Side outputs let you emit to zero or more outputs, not just 1:1, which is
why I included this here.
##########
website/www/site/content/en/documentation/sdks/yaml-udf.md:
##########
@@ -207,6 +207,73 @@ criteria. This can be accomplished with a `Filter`
transform, e.g.
keep: "col2 > 0"
```
+## Splitting
+
+It can also be useful to send different elements to different places
+(similar to what is done with side outputs in other SDKs).
+While this can be done with a set of `Filter` operations, if every
+element has a single destination it can be more natural to use a `Split`
+transform instead which send every element to a unique output.
+For example, this will send all elements where `col1` is equal to `"a"` to the
+output `Split.a`.
+
+```
+- type: Split
+ input: input
+ config:
+ destination: col1
+ outputs: ['a', 'b', 'c']
+
+- type: SomeTransform
+ input: Split.a
+ config:
+ param: ...
+
+- type: AnotherTransform
+ input: Split.b
+ config:
+ param: ...
+```
Review Comment:
Yeah, I went back and forth on the naming here. I like `on` a lot better.
As for non-strings, we have two options: either require a string or convert.
The latter could get dicey (both for composites, and even for some primitives,
e.g. is it `True` or `true`?) so I think I'll stick with the first for now
(which can always be relaxed in the future if we decide semantics there).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]