Polber commented on code in PR #30269:
URL: https://github.com/apache/beam/pull/30269#discussion_r1506183891
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -1535,7 +1670,13 @@ automatically apply some optimizations:
##### 4.2.4.1. Simple combinations using simple functions {#simple-combines}
+<span class="language-yaml">
+Beam YAML has the following buit-in CombineFns: count, sum, min, max,
+mean, any, all, group, and concat.
+CombineFns from other languages can also be referenced.
Review Comment:
Should there be a mention of aggregation being experimental in YAML?
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -46,6 +46,11 @@ The [Go
SDK](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam) supp
The Typescript SDK supports Node v16+ and is still experimental.
{{< /paragraph >}}
+{{< paragraph class="language-yaml">}}
+Yaml is supported as of Beam 2.52, but is under active development and the most
Review Comment:
```suggestion
YAML is supported as of Beam 2.52, but is under active development and the
most
```
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -93,6 +98,11 @@ include:
* I/O transforms: Beam comes with a number of "IOs" - library `PTransform`s
that
read or write data to various external storage systems.
+<span class="language-yaml">
+Note that in Beam YAML `PCollection`s are either implicit (e.g. when using
Review Comment:
nit:
```suggestion
Note that in Beam YAML, `PCollection`s are either implicit (e.g. when using
```
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -3129,6 +3310,16 @@ Unfortunately type information in Typescript is not
propagated to the runtime la
so it needs to be manually specified in some places (e.g. when using
cross-language pipelines).
{{< /paragraph >}}
+{{< paragraph class="language-yaml">}}
+In Beam YAML, all transforms produce and accept schema'd data which is used to
validate the pipeline.
+{{< /paragraph >}}
+
+{{< paragraph class="language-yaml">}}
+In some cases Beam is unable to figure out the output type of a mapping
function.
+In this case you can specify it manually using
Review Comment:
nit:
```suggestion
In some cases, Beam is unable to figure out the output type of a mapping
function.
In this case, you can specify it manually using
```
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -628,6 +694,47 @@ the transform itself as an argument, and the operation
returns the output
[Output PCollection] = await [Input PCollection].applyAsync([AsyncTransform])
{{< /highlight >}}
+{{< highlight yaml >}}
+pipeline:
+ transforms:
+ ...
+ - name: ProducingTransform
+ type: ProducingTransformType
+ ...
+
+ - name: MyTransform
+ type: MyTransformType
+ input: ProducingTransform.output_name
+ ...
+{{< /highlight >}}
+
+{{< paragraph class="language-yaml">}}
+The `.output_name` designation can be omitted for those transforms
+with a single (non-error) output.
+{{< /highlight >}}
Review Comment:
This seems a bit confusing - it reads like it is typical to specify the
output, when in practice this is rarely done (except error handling). What if
the logic was reversed by having the example be omitting the named output, and
then mentioning that transforms with multiple outputs can use the
`.output_name` notation (and link to error_handling doc)
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -350,6 +383,11 @@ After you've created your `Pipeline`, you'll need to begin
by creating at least
one `PCollection` in some form. The `PCollection` you create serves as the
input
for the first operation in your pipeline.
+<span class="language-yaml">
+In Beam YAML `PCollection`s are either implicit (e.g. when using `chain`)
Review Comment:
nit: ```suggestion
In Beam YAML, `PCollection`s are either implicit (e.g. when using `chain`)
```
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -796,9 +903,18 @@ input into a different format; you might also use `ParDo`
to convert processed
data into a format suitable for output, like database table rows or printable
strings.
+{{< paragraph class="language-java language-go language-py
language-typescript">}}
When you apply a `ParDo` transform, you'll need to provide user code in the
form
of a `DoFn` object. `DoFn` is a Beam SDK class that defines a distributed
processing function.
+{{< /paragraph >}}
+
+{{< paragraph class="language-yaml">}}
+In Beam YAML `ParDo` operations are expressed by the `MapToFields`, `Filter`,
+and `Explode` transform types which may take a UDF in the language of your
+choice rather than introducing the notion of a `DoFn`.
Review Comment:
nit:
```suggestion
In Beam YAML, `ParDo` operations are expressed by the `MapToFields`,
`Filter`,
and `Explode` transform types. These types can take a UDF in the language of
your
choice, rather than introducing the notion of a `DoFn`.
```
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -202,6 +220,12 @@ One can either construct one manually, but it is also
common to pass an object
created from command line options such as `yargs.argv`.
{{< /paragraph >}}
+{{< paragraph class="language-yaml">}}
+Pipeline options are simply an optional yaml mapping property that is a
sibling to
Review Comment:
nit: ```suggestion
Pipeline options are simply an optional YAML mapping property that is a
sibling to
```
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -367,19 +405,29 @@ adapters](#pipeline-io). The adapters vary in their exact
usage, but all of them
read from some external data source and return a `PCollection` whose elements
represent the data records in that source.
-Each data source adapter has a `Read` transform; to read, you must apply that
-transform to the `Pipeline` object itself.
+Each data source adapter has a `Read` transform; to read,
+<span class="language-java language-py language-go language-typescript">
+you must apply that transform to the `Pipeline` object itself.
+</span>
+<span class="language-yaml">
+place this transform in the `source` or `transforms` portion of the pipeline.
+</span>
<span class="language-java">`TextIO.Read`</span>
<span class="language-py">`io.TextFileSource`</span>
<span class="language-go">`textio.Read`</span>
<span class="language-typescript">`textio.ReadFromText`</span>,
+<span class="language-yaml">`ReadFromText`</span>,
for example, reads from an
-external text file and returns a `PCollection` whose elements are of type
-`String`, each `String` represents one line from the text file. Here's how you
+external text file and returns a `PCollection` whose elements
+<span class="language-java language-py language-go language-typescript">
+are of type `String`, each `String`
Review Comment:
```suggestion
are of type `String` where each `String`
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]