[GitHub] [beam] kerrydc commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

GitBox Mon, 31 Oct 2022 08:30:58 -0700


kerrydc commented on code in PR #23085:
URL: https://github.com/apache/beam/pull/23085#discussion_r1008288264



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:

Review Comment:
   There are several options:



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters

Review Comment:
   local or cloud-based files



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters
+
+Through the tour, most of the examples use either `PCollection` created from 
in-memory data or data read from one of the cloud buckets: beam-examples, 
dataflow-samples. These buckets contain sample data sets specifically created 
for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while 
learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from 
an in-memory Go Collection. You can apply Create transform directly to your 
Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline

Review Comment:
   typo: pipline



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters
+
+Through the tour, most of the examples use either `PCollection` created from 
in-memory data or data read from one of the cloud buckets: beam-examples, 
dataflow-samples. These buckets contain sample data sets specifically created 
for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while 
learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from 
an in-memory Go Collection. You can apply Create transform directly to your 
Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", 
"is", "the", "question")
+
+    //Create a numerical PCollection
+    numbers := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+}
+```
+
+### Playground exercise
+
+You can find the complete code of this example in the playground window you 
can run and experiment with.
+
+One of the differences you will notice is that it also contains the part to 
output `PCollection` elements to the console. Don’t worry if you don’t quite 
understand it, as the concept of `ParDo` transform will be explained later in 
the course. Feel free, however, to use it in exercises and challenges to 
explore results.
+
+Do you also notice in what order elements of PCollection appear in the 
console? Why is that? You can also run the example several times to see if the 
output stays the same or changes.

Review Comment:
   Do you also notice the order elements of the PCollection appear in the 
console? Why is that? You can also run the example several times to see if the 
output stays the same or changes.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and 
challenges throughout the course, you’ll be working with one of the datasets 
stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main 
part:
+* Loading text lines using `TextIO.Read` transform
+* Parsing lines of text into tabular format
+
+### Playground exercise
+
+Try to experiment with an example in the playground window and modify the code 
to process other fields from New York taxi rides dataset.

Review Comment:
   Try to experiment with an example in the playground window and modify the 
code to process other fields from the New York taxi rides dataset.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and 
challenges throughout the course, you’ll be working with one of the datasets 
stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main 
part:
+* Loading text lines using `TextIO.Read` transform
+* Parsing lines of text into tabular format

Review Comment:
   Parse



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and 
challenges throughout the course, you’ll be working with one of the datasets 
stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main 
part:
+* Loading text lines using `TextIO.Read` transform
+* Parsing lines of text into tabular format
+
+### Playground exercise
+
+Try to experiment with an example in the playground window and modify the code 
to process other fields from New York taxi rides dataset.
+
+Here is a list of fields and a sample record from this dataset:

Review Comment:
   Please describe the fields in more detail, and highlight which fields are 
important for these exercises.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/example/textIo.go:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+/*
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+*/
+// beam-playground:
+//   name: TextIO
+//   description: TextIO example.
+//   multifile: false
+//   context_line: 46
+//   categories:
+//     - Quickstart
+//   complexity: BASIC
+//   tags:
+//     - hellobeam
+
+package main
+
+import (
+    "context"
+    "fmt"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
+    "regexp"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
+)
+
+var (
+    wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
+)
+
+func less(a, b string) bool{
+    return len(a)>len(b)
+}
+
+func main() {
+    p, s := beam.NewPipelineWithRoot()
+
+    file := Read(s, "gs://apache-beam-samples/shakespeare/kinglear.txt")
+
+    lines := getLines(s, file)
+    fixedSizeLines := top.Largest(s,lines,10,less)
+    output(s, "Lines: ", fixedSizeLines)
+
+    words := getWords(s,lines)
+    fixedSizeWords := top.Largest(s,words,10,less)
+    output(s, "Words: ", fixedSizeWords)
+
+    err := beamx.Run(context.Background(), p)
+    if err != nil {
+        log.Exitf(context.Background(), "Failed to execute job: %v", err)
+    }
+}
+
+// Read reads from fiename(s) specified by a glob string and a returns a 
PCollection<string>.
+func Read(s beam.Scope, glob string) beam.PCollection {
+    return textio.Read(s, glob)
+}
+
+// Read text file content line by line. resulting PCollection contains 
elements, where each element contains a single line of text from the input file.

Review Comment:
   The resulting PCollection contains elements, where each element contains a 
single line of text from the input file.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters
+
+Through the tour, most of the examples use either `PCollection` created from 
in-memory data or data read from one of the cloud buckets: beam-examples, 
dataflow-samples. These buckets contain sample data sets specifically created 
for educational purposes.

Review Comment:
   Through the tour, most of the examples use either a `PCollection` created 
from in-memory data or data read from one of the cloud buckets "beam-examples" 
or "dataflow-samples". These buckets contain sample data sets specifically 
created for educational purposes.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes 
in one of the Beam SDKs. Your driver program defines your pipeline, including 
all of the inputs, transforms, and outputs. It also sets execution options for 
your pipeline (typically passed by using command-line options). These include 
the Pipeline Runner, which, in turn, determines what back-end your pipeline 
will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of 
large-scale distributed data processing. The same Beam abstractions work with 
both batch and streaming data sources. When you create your Beam pipeline, you 
can think about your data processing task in terms of these abstractions. They 
include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from 
start to finish. This includes reading input data, transforming that data, and 
writing output data. All Beam driver programs must create a Pipeline. When you 
create the Pipeline, you must also specify the execution options that tell the 
Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your 
Beam pipeline operates on. The data set can be bounded, meaning it comes from a 
fixed source like a file, or unbounded, meaning it comes from a continuously 
updating source via a subscription or other mechanism. Your pipeline typically 
creates an initial PCollection by reading data from an external data source, 
but you can also create a PCollection from in-memory data within your driver 
program. From there, PCollections are the inputs and outputs for each step in 
your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a 
step, in your pipeline. Every PTransform takes one or more PCollection objects 
as the input, performs a processing function that you provide on the elements 
of that PCollection, and then produces zero or more output PCollection objects.
+
+→ `Scope`: The Go SDK has an explicit scope variable used to build a 
`Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The 
scope variable is then passed to `PTransform` functions that place them in the 
`Pipeline` that owns the `Scope`.
+
+→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms 
that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:

Review Comment:
   A typical Beam driver program works like this:



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters
+
+Through the tour, most of the examples use either `PCollection` created from 
in-memory data or data read from one of the cloud buckets: beam-examples, 
dataflow-samples. These buckets contain sample data sets specifically created 
for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while 
learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from 
an in-memory Go Collection. You can apply Create transform directly to your 
Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", 
"is", "the", "question")
+
+    //Create a numerical PCollection
+    numbers := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+}
+```
+
+### Playground exercise
+
+You can find the complete code of this example in the playground window you 
can run and experiment with.

Review Comment:
   You can find the complete code of this example in the playground window 
where you can run the pipeline and experiment with it.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes 
in one of the Beam SDKs. Your driver program defines your pipeline, including 
all of the inputs, transforms, and outputs. It also sets execution options for 
your pipeline (typically passed by using command-line options). These include 
the Pipeline Runner, which, in turn, determines what back-end your pipeline 
will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of 
large-scale distributed data processing. The same Beam abstractions work with 
both batch and streaming data sources. When you create your Beam pipeline, you 
can think about your data processing task in terms of these abstractions. They 
include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from 
start to finish. This includes reading input data, transforming that data, and 
writing output data. All Beam driver programs must create a Pipeline. When you 
create the Pipeline, you must also specify the execution options that tell the 
Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your 
Beam pipeline operates on. The data set can be bounded, meaning it comes from a 
fixed source like a file, or unbounded, meaning it comes from a continuously 
updating source via a subscription or other mechanism. Your pipeline typically 
creates an initial PCollection by reading data from an external data source, 
but you can also create a PCollection from in-memory data within your driver 
program. From there, PCollections are the inputs and outputs for each step in 
your pipeline.

Review Comment:
   The data set can be bounded, meaning it comes from a fixed source like a 
file, or unbounded, meaning it comes from a continuously updating source via a 
subscription or other mechanism. A pipeline with bounded input is referred to 
as a Batch Pipeline, while an unbounded input is used with a Streaming 
Pipeline. 



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes 
in one of the Beam SDKs. Your driver program defines your pipeline, including 
all of the inputs, transforms, and outputs. It also sets execution options for 
your pipeline (typically passed by using command-line options). These include 
the Pipeline Runner, which, in turn, determines what back-end your pipeline 
will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of 
large-scale distributed data processing. The same Beam abstractions work with 
both batch and streaming data sources. When you create your Beam pipeline, you 
can think about your data processing task in terms of these abstractions. They 
include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from 
start to finish. This includes reading input data, transforming that data, and 
writing output data. All Beam driver programs must create a Pipeline. When you 
create the Pipeline, you must also specify the execution options that tell the 
Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your 
Beam pipeline operates on. The data set can be bounded, meaning it comes from a 
fixed source like a file, or unbounded, meaning it comes from a continuously 
updating source via a subscription or other mechanism. Your pipeline typically 
creates an initial PCollection by reading data from an external data source, 
but you can also create a PCollection from in-memory data within your driver 
program. From there, PCollections are the inputs and outputs for each step in 
your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a 
step, in your pipeline. Every PTransform takes one or more PCollection objects 
as the input, performs a processing function that you provide on the elements 
of that PCollection, and then produces zero or more output PCollection objects.
+
+→ `Scope`: The Go SDK has an explicit scope variable used to build a 
`Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The 
scope variable is then passed to `PTransform` functions that place them in the 
`Pipeline` that owns the `Scope`.
+
+→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms 
that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:
+
+→ Create a Pipeline object and set the pipeline execution options, including 
the Pipeline Runner.
+
+→ Create an initial `PCollection` for pipeline data, either using the IOs to 
read data from an external storage system, or using a Create transform to build 
a `PCollection` from in-memory data.

Review Comment:
   → Create an initial `PCollection` of pipeline data, either by using the IOs 
to read data from an external storage system, or by using a Create transform to 
build a `PCollection` from in-memory data.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and 
challenges throughout the course, you’ll be working with one of the datasets 
stored as csv files in either beam-examples, dataflow-samples buckets.
+
+Loading data from csv file requires some processing and consists of two main 
part:
+* Loading text lines using `TextIO.Read` transform

Review Comment:
   Loading data from csv file takes two steps:
   * Load text lines using the `TextIO.Read` transform



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters
+
+Through the tour, most of the examples use either `PCollection` created from 
in-memory data or data read from one of the cloud buckets: beam-examples, 
dataflow-samples. These buckets contain sample data sets specifically created 
for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while 
learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from 
an in-memory Go Collection. You can apply Create transform directly to your 
Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", 
"is", "the", "question")

Review Comment:
   numbers should be strings or hamlet



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/setting-pipeline/description.md:
##########
@@ -0,0 +1,63 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Configuring pipeline options
+
+Use the pipeline options to configure different aspects of your pipeline, such 
as the pipeline runner that will execute your pipeline and any runner-specific 
configuration required by the chosen runner. Your pipeline options will 
potentially include information such as your project ID or a location for 
storing files.
+
+### Setting PipelineOptions from command-line arguments
+
+Use Go flags to parse command line arguments to configure your pipeline. Flags 
must be parsed before `beam.Init()` is called.
+
+```
+// If beamx or Go flags are used, flags must be parsed first,
+// before beam.Init() is called.
+flag.Parse()
+```
+
+This interprets command-line arguments this follow the format:
+
+```
+--<option>=<value>
+```
+
+### Creating custom options
+
+You can add your own custom options in addition to the standard 
`PipelineOptions`.
+
+The following example shows how to add `input` and `output` custom options:
+
+```
+// Use standard Go flags to define pipeline options.
+var (
+  input  = flag.String("input", "gs://my-bucket/input", "Input for the 
pipeline")
+  output = flag.String("output", "gs://my-bucket/output", "Output for the 
pipeline")
+)
+```
+
+### Playground exercise
+
+You can find the full code of the above example in the playground window, 
which you can run and experiment with.
+
+You can transfer files of other extensions. For example, a csv file with taxi 
order data. And after making some transformations, you can write to a new csv 
file:
+```
+var (
+  input = flag.String("input", 
"gs://apache-beam-samples/nyc_taxi/misc/sample1000.csv", "File(s) to read.")
+
+  output = flag.String("output", "output.csv", "Output file (required).")
+)
+```
+
+Overview 
[file](https://storage.googleapis.com/apache-beam-samples/nyc_taxi/misc/sample1000.csv)
+
+Do you also notice in what order elements of PCollection appear in the 
console? Why is that? You can also run the example several times to see if the 
output stays the same or changes.

Review Comment:
   Do you notice the order that elements of the PCollection appear in the 
console?



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md:
##########
@@ -0,0 +1,56 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating PCollection
+
+Now that you know how to create a Beam pipeline and pass parameters into it, 
it is time to learn how to create an initial `PCollection` and fill it with 
data.
+
+There are several options on how to do that:
+
+→ You can create a PCollection of data stored in an in-memory collection class 
in your driver program.
+
+→ You can also read the data from a variety of external sources such as local 
and cloud-based files, databases, or other sources using Beam-provided I/O 
adapters
+
+Through the tour, most of the examples use either `PCollection` created from 
in-memory data or data read from one of the cloud buckets: beam-examples, 
dataflow-samples. These buckets contain sample data sets specifically created 
for educational purposes.
+
+We encourage you to take a look, explore these data sets and use them while 
learning Apache Beam.
+
+### Creating a PCollection from in-memory data
+
+You can use the Beam-provided Create transform to create a `PCollection` from 
an in-memory Go Collection. You can apply Create transform directly to your 
Pipeline object itself.
+
+The following example code shows how to do this:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    //Now create the PCollection using list of strings
+    numbers := beam.Create(s, "To", "be", "or", "not", "to", "be","that", 
"is", "the", "question")
+
+    //Create a numerical PCollection
+    numbers := beam.Create(s, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+}
+```
+
+### Playground exercise
+
+You can find the complete code of this example in the playground window you 
can run and experiment with.
+
+One of the differences you will notice is that it also contains the part to 
output `PCollection` elements to the console. Don’t worry if you don’t quite 
understand it, as the concept of `ParDo` transform will be explained later in 
the course. Feel free, however, to use it in exercises and challenges to 
explore results.

Review Comment:
   One difference you will notice is that it also contains a function to output 
`PCollection` elements to the console. Don’t worry if you don’t quite 
understand it, as the concept of `ParDo` transforms will be explained later in 
the course. Feel free, however, to use it in exercises and challenges to 
explore results.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes 
in one of the Beam SDKs. Your driver program defines your pipeline, including 
all of the inputs, transforms, and outputs. It also sets execution options for 
your pipeline (typically passed by using command-line options). These include 
the Pipeline Runner, which, in turn, determines what back-end your pipeline 
will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of 
large-scale distributed data processing. The same Beam abstractions work with 
both batch and streaming data sources. When you create your Beam pipeline, you 
can think about your data processing task in terms of these abstractions. They 
include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from 
start to finish. This includes reading input data, transforming that data, and 
writing output data. All Beam driver programs must create a Pipeline. When you 
create the Pipeline, you must also specify the execution options that tell the 
Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your 
Beam pipeline operates on. The data set can be bounded, meaning it comes from a 
fixed source like a file, or unbounded, meaning it comes from a continuously 
updating source via a subscription or other mechanism. Your pipeline typically 
creates an initial PCollection by reading data from an external data source, 
but you can also create a PCollection from in-memory data within your driver 
program. From there, PCollections are the inputs and outputs for each step in 
your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a 
step, in your pipeline. Every PTransform takes one or more PCollection objects 
as the input, performs a processing function that you provide on the elements 
of that PCollection, and then produces zero or more output PCollection objects.
+
+→ `Scope`: The Go SDK has an explicit scope variable used to build a 
`Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The 
scope variable is then passed to `PTransform` functions that place them in the 
`Pipeline` that owns the `Scope`.
+
+→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms 
that read or write data to various external storage systems.
+
+A typical Beam driver program works as follows:
+
+→ Create a Pipeline object and set the pipeline execution options, including 
the Pipeline Runner.
+
+→ Create an initial `PCollection` for pipeline data, either using the IOs to 
read data from an external storage system, or using a Create transform to build 
a `PCollection` from in-memory data.
+
+→ Apply `PTransforms` to each `PCollection`. Transforms can change, filter, 
group, analyze, or otherwise process the elements in a PCollection. A transform 
creates a new output PCollection without modifying the input collection. A 
typical pipeline applies subsequent transforms to each new output PCollection 
in turn until the processing is complete. However, note that a pipeline does 
not have to be a single straight line of transforms applied one after another: 
think of PCollections as variables and PTransforms as functions applied to 
these variables: the shape of the pipeline can be an arbitrarily complex 
processing graph.

Review Comment:
   A transform creates a new output PCollection without modifying the input 
collection, and the transform is always applied to every element of the input 
PCollection.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated 
data-parallel processing `pipelines` that may be executed across a diversity of 
execution engines, or `runners`. The core concepts of this layer are based upon 
the Beam Model (formerly referred to as the Dataflow Model), and implemented to 
varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to 
validate that pipelines adhere to the Apache Beam model as closely as possible. 
Instead of focusing on efficient pipeline execution, the Direct Runner performs 
additional checks to ensure that users do not rely on semantics that are not 
guaranteed by the model. Some of these checks include:
+
+* enforcing immutability of elements
+* enforcing encodability of elements
+* elements are processed in an arbitrary order at all points
+* serialization of user functions (DoFn, CombineFn, etc.)
+
+Using the Direct Runner for testing and development helps ensure that 
pipelines are robust across different Beam runners. In addition, debugging 
failed runs can be a non-trivial task when a pipeline executes on a remote 
cluster. Instead, it is often faster and simpler to perform local unit testing 
on your pipeline code. Unit testing your pipeline locally also allows you to 
use your preferred local debugging tools.In the SDK Go, the default is runner 
**DirectRunner**.

Review Comment:
   In the Go SDK, the default is runner **DirectRunner**.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize 
yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units 
covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each 
module, learning progress is displayed next to it.

Review Comment:
   You can access the full list of modules



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated 
data-parallel processing `pipelines` that may be executed across a diversity of 
execution engines, or `runners`. The core concepts of this layer are based upon 
the Beam Model (formerly referred to as the Dataflow Model), and implemented to 
varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to 
validate that pipelines adhere to the Apache Beam model as closely as possible. 
Instead of focusing on efficient pipeline execution, the Direct Runner performs 
additional checks to ensure that users do not rely on semantics that are not 
guaranteed by the model. Some of these checks include:

Review Comment:
   I think we should use the Portable Runner for examples, since we intend to 
have x-lang examples eventually. If the portable Runner doesn't work for an 
example we can give the direct runner flags.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize 
yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units 
covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each 
module, learning progress is displayed next to it.
+Throughout the tour, you will find list of learning materials, examples, 
exercises and challenges for you to complete.
+Learning units are accompanied by examples that you can review in the right 
pane, and run by clicking ‘Run’ button to see the output.

Review Comment:
   Learning units are accompanied by code examples that you can review in the 
upper right pane. You can edit the code, or just run the example by clicking 
the ‘Run’. Output is displayed in the lower right pane.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-terms/description.md:
##########
@@ -0,0 +1,38 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+**Pipeline** - A pipeline is a user-constructed graph of transformations that 
defines the desired data processing operations.
+
+**PCollection** - A PCollection is a data set or data stream. The data that a 
pipeline processes is part of a PCollection.
+
+**PTransform** - A PTransform (or transform) represents a data processing 
operation, or a step, in your pipeline. A transform is applied to zero or more 
PCollection objects, and produces zero or more PCollection objects.
+
+**Aggregation** - Aggregation is computing a value from multiple (1 or more) 
input elements.
+
+**User-defined function (UDF)** - Some Beam operations allow you to run 
user-defined code as a way to configure the transform.
+
+**Schema** - A schema is a language-independent type definition for a 
PCollection. The schema for a PCollection defines elements of that PCollection 
as an ordered list of named fields.
+
+**SDK** - A language-specific library that lets pipeline authors build 
transforms, construct their pipelines, and submit them to a runner.
+
+**Runner** - A runner runs a Beam pipeline using the capabilities of your 
chosen data processing engine.
+
+**Window** - A PCollection can be subdivided into windows based on the 
timestamps of the individual elements. Windows enable grouping operations over 
collections that grow over time by dividing the collection into windows of 
finite collections.
+
+**Watermark** - A watermark is a guess as to when all data in a certain window 
is expected to have arrived. This is needed because data isn’t always 
guaranteed to arrive in a pipeline in time order, or to always arrive at 
predictable intervals.

Review Comment:
   This is needed because data isn’t always guaranteed to arrive in a pipeline 
in event time order, or to always arrive at predictable intervals.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/example/csvExample.go:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// beam-playground:
+//   name: CSV
+//   description: CSV example.
+//   multifile: false
+//   context_line: 44
+//   categories:
+//     - Quickstart
+//   complexity: BASIC
+//   tags:
+//     - hellobeam
+
+package main
+
+import (
+    "context"
+    "fmt"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
+    "strconv"
+    "strings"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
+
+)
+
+func less(a, b float64) bool{
+    return a>b
+}
+
+func main() {
+    p, s := beam.NewPipelineWithRoot()
+
+    file := Read(s, "gs://apache-beam-samples/nyc_taxi/misc/sample1000.csv")
+
+    cost := applyTransform(s, file)
+
+    fixedSizeElements := top.Largest(s,cost,10,less)

Review Comment:
   Please add a comment explaining this



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize 
yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units 
covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each 
module, learning progress is displayed next to it.
+Throughout the tour, you will find list of learning materials, examples, 
exercises and challenges for you to complete.
+Learning units are accompanied by examples that you can review in the right 
pane, and run by clicking ‘Run’ button to see the output.
+Each module also contains a challenge based on the material learned. Try to 
solve as many as you can, and if you need help, just click on the ‘Hint’ button 
or examine the correct solution by clicking the ‘Solution’ button.
+Now let’s start the tour with learning core Beam principles.

Review Comment:
   Now let’s start the tour by learning some core Beam principles.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md:
##########
@@ -0,0 +1,41 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from text file
+
+You use one of the Beam-provided I/O adapters to read from an external source. 
The adapters vary in their exact usage, but all of them read from some external 
data source and return a `PCollection` whose elements represent the data 
records in that source.

Review Comment:
   You can use one of the Beam-provided I/O adapters to read from an external 
source. The adapters vary in their exact usage, but all of them read from some 
external data source and return a `PCollection` whose elements represent the 
data in that source.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md:
##########
@@ -0,0 +1,33 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Read from csv file
+
+Data processing pipelines often work with tabular data. In many examples and 
challenges throughout the course, you’ll be working with one of the datasets 
stored as csv files in either beam-examples, dataflow-samples buckets.

Review Comment:
   as csv files in the "beam-examples" or "dataflow-samples" buckets.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/creating-pipeline/description.md:
##########
@@ -0,0 +1,36 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Creating a pipeline
+
+The `Pipeline` abstraction encapsulates all the data and steps in your data 
processing task. Your Beam driver program typically starts by constructing a 
Pipeline object, and then using that object as the basis for creating the 
pipeline’s data sets as PCollections and its operations as `Transforms`.
+
+To use Beam, your driver program must first create an instance of the Beam SDK 
class Pipeline (typically in the main() function). When you create your 
`Pipeline`, you’ll also need to set some configuration options. You can set 
your pipeline’s configuration options programmatically, but it’s often easier 
to set the options ahead of time (or read them from the command line) and pass 
them to the Pipeline object when you create the object.
+
+```
+// beam.Init() is an initialization hook that must be called
+// near the beginning of main(), before creating a pipeline.
+beam.Init()
+
+// Create the Pipeline object and root scope.
+pipeline, scope := beam.NewPipelineWithRoot()
+```
+
+### Playground exercise
+
+You can find the full code of the above example in the playground window, 
which you can run and experiment with. And you can create a `pipeline`, `scope` 
separately, it is an alternative to `beam.NewPipelineWithRoot()`. It is 
convenient if manipulations are needed before creating an element.

Review Comment:
   You can find the full code of the above example in the playground window, 
where you can run the pipeline and experiment with it. You can create a 
`pipeline` and `scope` separately, as an alternative to using 
`beam.NewPipelineWithRoot()`. This can be convenient if manipulations are 
needed before creating an element.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md:
##########
@@ -0,0 +1,41 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from text file
+
+You use one of the Beam-provided I/O adapters to read from an external source. 
The adapters vary in their exact usage, but all of them read from some external 
data source and return a `PCollection` whose elements represent the data 
records in that source.
+
+Each data source adapter has a Read transform; to read, you must apply that 
transform to the Pipeline object itself.
+
+`TextIO.Read` , for example, reads from an external text file and returns a 
`PCollection` whose elements are of type String. Each String represents one 
line from the text file. Here’s how you would apply `TextIO.Read` to your 
Pipeline to create a `PCollection`:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    // Now create the PCollection by reading text files. Separate elements 
will be added for each line in the input file
+    lines :=  textio.Read(scope, 'gs://some/inputData.txt')
+
+}
+```
+
+### Playground exercise
+
+In the playground window, you can find an example that reads a king lear poem 
from the text file stored in the Google Storage bucket and fills PCollection 
with individual lines and then with individual words. Try it out and see what 
the output is.
+
+One of the differences you will see is that the output is much shorter than 
the input file itself. This is because the number of elements in the output 
`PCollection` is limited with the `top.Largest(s,lines,10,less)` transform. Use 
Sample.fixedSizeGlobally transform of is another technique you can use to 
troubleshoot and limit the output sent to the console for debugging purposes in 
case of large input datasets.

Review Comment:
   One of the differences you will see is that the output is much shorter than 
the input file itself. This is because the number of elements in the output 
`PCollection` is limited with the `top.Largest(s,lines,10,less)` transform. 
Another technique you can use to limit the output sent to the console for 
debugging purposes is the Sample.fixedSizeGlobally transform.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md:
##########
@@ -0,0 +1,41 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from text file
+
+You use one of the Beam-provided I/O adapters to read from an external source. 
The adapters vary in their exact usage, but all of them read from some external 
data source and return a `PCollection` whose elements represent the data 
records in that source.
+
+Each data source adapter has a Read transform; to read, you must apply that 
transform to the Pipeline object itself.
+
+`TextIO.Read` , for example, reads from an external text file and returns a 
`PCollection` whose elements are of type String. Each String represents one 
line from the text file. Here’s how you would apply `TextIO.Read` to your 
Pipeline to create a `PCollection`:
+
+```
+func main() {
+    ctx := context.Background()
+
+    // First create pipline
+    p, s := beam.NewPipelineWithRoot()
+
+    // Now create the PCollection by reading text files. Separate elements 
will be added for each line in the input file
+    lines :=  textio.Read(scope, 'gs://some/inputData.txt')
+
+}
+```
+
+### Playground exercise
+
+In the playground window, you can find an example that reads a king lear poem 
from the text file stored in the Google Storage bucket and fills PCollection 
with individual lines and then with individual words. Try it out and see what 
the output is.

Review Comment:
   In the playground window, you can find an example that reads the Shakespeare 
play King Lear from the text file stored in the Google Storage bucket and fills 
PCollection with individual lines and then with individual words. Try it out 
and see what the output is.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/example/textIo.go:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+/*
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+*/
+// beam-playground:
+//   name: TextIO
+//   description: TextIO example.
+//   multifile: false
+//   context_line: 46
+//   categories:
+//     - Quickstart
+//   complexity: BASIC
+//   tags:
+//     - hellobeam
+
+package main
+
+import (
+    "context"
+    "fmt"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/filter"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
+    "regexp"
+    "github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
+)
+
+var (
+    wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
+)
+
+func less(a, b string) bool{
+    return len(a)>len(b)
+}
+
+func main() {
+    p, s := beam.NewPipelineWithRoot()
+
+    file := Read(s, "gs://apache-beam-samples/shakespeare/kinglear.txt")
+
+    lines := getLines(s, file)
+    fixedSizeLines := top.Largest(s,lines,10,less)
+    output(s, "Lines: ", fixedSizeLines)
+
+    words := getWords(s,lines)
+    fixedSizeWords := top.Largest(s,words,10,less)
+    output(s, "Words: ", fixedSizeWords)
+
+    err := beamx.Run(context.Background(), p)
+    if err != nil {
+        log.Exitf(context.Background(), "Failed to execute job: %v", err)
+    }
+}
+
+// Read reads from fiename(s) specified by a glob string and a returns a 
PCollection<string>.
+func Read(s beam.Scope, glob string) beam.PCollection {
+    return textio.Read(s, glob)
+}
+
+// Read text file content line by line. resulting PCollection contains 
elements, where each element contains a single line of text from the input file.
+func getLines(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return filter.Include(s, input, func(element string) bool {
+        return element != ""
+    })
+}
+
+// getWords read text lines and split into PCollection of words.

Review Comment:
   getWords reads text lines and splits them into PCollection of words.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated 
data-parallel processing `pipelines` that may be executed across a diversity of 
execution engines, or `runners`. The core concepts of this layer are based upon 
the Beam Model (formerly referred to as the Dataflow Model), and implemented to 
varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to 
validate that pipelines adhere to the Apache Beam model as closely as possible. 
Instead of focusing on efficient pipeline execution, the Direct Runner performs 
additional checks to ensure that users do not rely on semantics that are not 
guaranteed by the model. Some of these checks include:
+
+* enforcing immutability of elements
+* enforcing encodability of elements
+* elements are processed in an arbitrary order at all points
+* serialization of user functions (DoFn, CombineFn, etc.)
+
+Using the Direct Runner for testing and development helps ensure that 
pipelines are robust across different Beam runners. In addition, debugging 
failed runs can be a non-trivial task when a pipeline executes on a remote 
cluster. Instead, it is often faster and simpler to perform local unit testing 
on your pipeline code. Unit testing your pipeline locally also allows you to 
use your preferred local debugging tools.In the SDK Go, the default is runner 
**DirectRunner**.
+
+Additionally, you can read 
[here](https://beam.apache.org/documentation/runners/direct/)

Review Comment:
   Additionally, you can read -> You can read more 



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md:
##########
@@ -0,0 +1,43 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Overview
+
+To use Beam, you first need to first create a driver program using the classes 
in one of the Beam SDKs. Your driver program defines your pipeline, including 
all of the inputs, transforms, and outputs. It also sets execution options for 
your pipeline (typically passed by using command-line options). These include 
the Pipeline Runner, which, in turn, determines what back-end your pipeline 
will run on.
+
+The Beam SDKs provide several abstractions that simplify the mechanics of 
large-scale distributed data processing. The same Beam abstractions work with 
both batch and streaming data sources. When you create your Beam pipeline, you 
can think about your data processing task in terms of these abstractions. They 
include:
+
+→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from 
start to finish. This includes reading input data, transforming that data, and 
writing output data. All Beam driver programs must create a Pipeline. When you 
create the Pipeline, you must also specify the execution options that tell the 
Pipeline where and how to run.
+
+→ `PCollection`: A PCollection represents a distributed data set that your 
Beam pipeline operates on. The data set can be bounded, meaning it comes from a 
fixed source like a file, or unbounded, meaning it comes from a continuously 
updating source via a subscription or other mechanism. Your pipeline typically 
creates an initial PCollection by reading data from an external data source, 
but you can also create a PCollection from in-memory data within your driver 
program. From there, PCollections are the inputs and outputs for each step in 
your pipeline.
+
+→ `PTransform`: A PTransform represents a data processing operation, or a 
step, in your pipeline. Every PTransform takes one or more PCollection objects 
as the input, performs a processing function that you provide on the elements 
of that PCollection, and then produces zero or more output PCollection objects.

Review Comment:
   Every PTransform takes one or more PCollection objects as the input, 
performs a user provided processing function on all the elements of that 
PCollection, and then produces zero or more output PCollection objects. It is 
crucial to understand that the PTransform is applied to each element of the 
PCollection independently. 



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md:
##########
@@ -0,0 +1,22 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Tour of Beam Programming Guide
+
+Welcome to a Tour Of Beam, a learning guide you can use to familiarize 
yourself with the Apache Beam.
+The tour is divided into a list of modules that contain learning units 
covering various Apache Beam features and principles.
+You can access list of modules by clicking ‘<<’ button on the left . For each 
module, learning progress is displayed next to it.
+Throughout the tour, you will find list of learning materials, examples, 
exercises and challenges for you to complete.

Review Comment:
   Throughout the tour, you will find learning materials, examples, exercises 
and challenges for you to complete.



##########
learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/runner-concepts/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Overview
+
+Apache Beam provides a portable API layer for building sophisticated 
data-parallel processing `pipelines` that may be executed across a diversity of 
execution engines, or `runners`. The core concepts of this layer are based upon 
the Beam Model (formerly referred to as the Dataflow Model), and implemented to 
varying degrees in each Beam `runner`.
+
+### Direct runner
+The Direct Runner executes pipelines on your machine and is designed to 
validate that pipelines adhere to the Apache Beam model as closely as possible. 
Instead of focusing on efficient pipeline execution, the Direct Runner performs 
additional checks to ensure that users do not rely on semantics that are not 
guaranteed by the model. Some of these checks include:
+
+* enforcing immutability of elements
+* enforcing encodability of elements
+* elements are processed in an arbitrary order at all points
+* serialization of user functions (DoFn, CombineFn, etc.)
+
+Using the Direct Runner for testing and development helps ensure that 
pipelines are robust across different Beam runners. In addition, debugging 
failed runs can be a non-trivial task when a pipeline executes on a remote 
cluster. Instead, it is often faster and simpler to perform local unit testing 
on your pipeline code. Unit testing your pipeline locally also allows you to 
use your preferred local debugging tools.In the SDK Go, the default is runner 
**DirectRunner**.
+
+Additionally, you can read 
[here](https://beam.apache.org/documentation/runners/direct/)
+
+#### Run example
+
+```
+$ go install github.com/apache/beam/sdks/v2/go/examples/wordcount
+$ wordcount --input <PATH_TO_INPUT_FILE> --output counts
+```
+
+### Google Cloud Dataflow runner
+
+The Google Cloud Dataflow uses the Cloud Dataflow managed service. When you 
run your pipeline with the Cloud Dataflow service, the runner uploads your 
executable code and dependencies to a Google Cloud Storage bucket and creates a 
Cloud Dataflow job, which executes your pipeline on managed resources in Google 
Cloud Platform. The Cloud Dataflow Runner and service are suitable for large 
scale, continuous jobs, and provide:
+* a fully managed service
+* autoscaling of the number of workers throughout the lifetime of the job
+* dynamic work rebalancing
+
+Additionally, you can read 
[here](https://beam.apache.org/documentation/runners/dataflow/)

Review Comment:
   You can read more



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] kerrydc commented on a diff in pull request #23085: [Tour of Beam] Learning content for "Introduction" module

Reply via email to