Re: [PR] DuetAI knowledge lookup prompts [beam]

via GitHub Fri, 01 Dec 2023 10:58:50 -0800


rszper commented on code in PR #29507:
URL: https://github.com/apache/beam/pull/29507#discussion_r1411430656



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
+
+At a high level, an SDF is responsible for processing element and restriction 
pairs. A restriction represents a subset of work that would have been necessary 
to have been done when processing the element.
+
+Executing an [Splittable 
DoFn](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 follows the following steps:
+1. Each element is paired with a restriction (e.g. filename is paired with 
offset range representing the whole file).

Review Comment:
   ```suggestion
   1. Each element is paired with a restriction. For example, filename is 
paired with offset range representing the whole file).
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.

Review Comment:
   ```suggestion
   When you apply a splittable `DoFn` to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
+
+At a high level, an SDF is responsible for processing element and restriction 
pairs. A restriction represents a subset of work that would have been necessary 
to have been done when processing the element.
+
+Executing an [Splittable 
DoFn](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 follows the following steps:
+1. Each element is paired with a restriction (e.g. filename is paired with 
offset range representing the whole file).
+2. Each element and restriction pair is split (e.g. offset ranges are broken 
up into smaller pieces).
+3. The runner redistributes the element and restriction pairs to several 
workers.
+4. Element and restriction pairs are processed in parallel (e.g. the file is 
read). Within this last step, the element and restriction pair can pause its 
own processing and/or be split into further element and restriction pairs.
+
+See Tour of Beam [Splittable DoFn 
module](https://tour.beam.apache.org/tour/python/splittable-dofn/splittable) 
for a practical example.
+
+See [community 
blogpost](https://beam.apache.org/blog/splittable-do-fn-is-available/) for more 
information.

Review Comment:
   ```suggestion
   For more information, see the [community blog 
post](https://beam.apache.org/blog/splittable-do-fn-is-available/).
   ```



##########
learning/prompts/documentation-lookup/15_advanced_xlang.md:
##########
@@ -0,0 +1,15 @@
+Prompt:
+What is a multi-language pipeline in Apache Beam?
+Response:
+Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a pipeline that 
reads from a Python source processes the data using a Java transform, and 
writes the data to a Python sink is a multi-language pipeline.

Review Comment:
   ```suggestion
   Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a multi-language 
pipeline might read from a Python source, process the data using a Java 
transform, and write the data to a Python sink.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?

Review Comment:
   ```suggestion
   - What does the data look like?
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).

Review Comment:
   ```suggestion
   Beam documentation has more information about [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and about common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection
+- Write or otherwise output the final, transformed PCollections
+- Run the pipeline
+
+Beam documentation has more on 
[developing](https://beam.apache.org/documentation/programming-guide/) and 
[executing](https://beam.apache.org/documentation/pipelines/create-your-pipeline/)
 pipelines.

Review Comment:
   ```suggestion
   The Apache Beam documentation has more information about 
[developing](https://beam.apache.org/documentation/programming-guide/) and 
[executing](https://beam.apache.org/documentation/pipelines/create-your-pipeline/)
 pipelines.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection

Review Comment:
   ```suggestion
   - Apply transforms to each `PCollection`.
   ```



##########
learning/prompts/documentation-lookup/17_advanced_ai_ml.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What are AI and ML capabilities in Apache Beam?
+Response:
+Apache Beam has several built-in [AI and ML 
capabilities](https://beam.apache.org/documentation/ml/overview/) that enable 
you to:
+- Process large datasets for both preprocessing and model inference.
+- Conduct exploratory data analysis and smoothly scale up data pipelines in 
production as part of your MLOps ecosystem.
+- Run your models in production with varying data loads, both in batch and 
streaming
+
+See [here](https://beam.apache.org/documentation/patterns/ai-platform/) for 
common AI Platform integration patterns in Apache Beam.
+
+The recommended way to implement inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). See 
[here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)
 for more details on how to use RunInference for PyTorch, scikit-learn, and 
TensorFlow.

Review Comment:
   ```suggestion
   The recommended way to implement inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). For 
more information about how to use RunInference for PyTorch, scikit-learn, and 
TensorFlow, see the [Use RunInference in Apache 
Beam](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)
 example in GitHub.
   ```



##########
learning/prompts/documentation-lookup/19_io_pubsub.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+Is PubSub supported in Apache Beam?
+Response:
+[PubSub](https://cloud.google.com/pubsub) is a[ Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service for sending and receiving 
messages between independent applications. Apache Beam provides a PubSubIO 
connector that allows you to read and write messages from and to PubSub.

Review Comment:
   ```suggestion
   Yes, Apache Beam integrates with Pub/Sub. 
[Pub/Sub](https://cloud.google.com/pubsub) is a [Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service. Use Pub/Sub to send and 
receive messages between independent applications. Apache Beam provides a 
`PubSubIO` connector that lets you read and write messages to and from Pub/Sub.
   ```



##########
learning/prompts/documentation-lookup/14_advanced_pipeline_patterns.md:
##########
@@ -0,0 +1,13 @@
+Prompt:
+What pipeline patterns exist in Apache Beam?
+Response:
+Beam pipeline patterns are a set of best practices for building Beam 
pipelines. They are based on real-world Beam deployments and are designed to 
help you build resilient, flexible, and portable Beam pipelines.
+
+Here are some of the most common pipeline patterns:

Review Comment:
   ```suggestion
   The following list includes some of the most common pipeline patterns:
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
+
+At a high level, an SDF is responsible for processing element and restriction 
pairs. A restriction represents a subset of work that would have been necessary 
to have been done when processing the element.
+
+Executing an [Splittable 
DoFn](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 follows the following steps:
+1. Each element is paired with a restriction (e.g. filename is paired with 
offset range representing the whole file).
+2. Each element and restriction pair is split (e.g. offset ranges are broken 
up into smaller pieces).

Review Comment:
   ```suggestion
   2. Each element and restriction pair is split. For example, offset ranges 
are broken up into smaller pieces.
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
+
+At a high level, an SDF is responsible for processing element and restriction 
pairs. A restriction represents a subset of work that would have been necessary 
to have been done when processing the element.
+
+Executing an [Splittable 
DoFn](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 follows the following steps:
+1. Each element is paired with a restriction (e.g. filename is paired with 
offset range representing the whole file).
+2. Each element and restriction pair is split (e.g. offset ranges are broken 
up into smaller pieces).
+3. The runner redistributes the element and restriction pairs to several 
workers.
+4. Element and restriction pairs are processed in parallel (e.g. the file is 
read). Within this last step, the element and restriction pair can pause its 
own processing and/or be split into further element and restriction pairs.

Review Comment:
   ```suggestion
   4. Element and restriction pairs are processed in parallel. For example, the 
file is read. Within this last step, the element and restriction pair can pause 
its own processing or be split into further element and restriction pairs.
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.

Review Comment:
   ```suggestion
   A splittable `DoFn` (SDF) is a generalization of 
[`DoFn`](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. A splittable `DoFn` makes it 
easier to create complex, modular I/O connectors in Beam .
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
+
+At a high level, an SDF is responsible for processing element and restriction 
pairs. A restriction represents a subset of work that would have been necessary 
to have been done when processing the element.
+
+Executing an [Splittable 
DoFn](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 follows the following steps:
+1. Each element is paired with a restriction (e.g. filename is paired with 
offset range representing the whole file).
+2. Each element and restriction pair is split (e.g. offset ranges are broken 
up into smaller pieces).
+3. The runner redistributes the element and restriction pairs to several 
workers.
+4. Element and restriction pairs are processed in parallel (e.g. the file is 
read). Within this last step, the element and restriction pair can pause its 
own processing and/or be split into further element and restriction pairs.
+
+See Tour of Beam [Splittable DoFn 
module](https://tour.beam.apache.org/tour/python/splittable-dofn/splittable) 
for a practical example.

Review Comment:
   ```suggestion
   See Tour of Beam [Splittable DoFn 
module](https://tour.beam.apache.org/tour/python/splittable-dofn/splittable) 
for a practical example.
   ```
   ```suggestion
   For an example, see the [Splittable DoFn 
module](https://tour.beam.apache.org/tour/python/splittable-dofn/splittable) in 
the Tour of Beam.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.

Review Comment:
   ```suggestion
   During each iteration, you might need to go back and forth between the 
different steps to refine your pipeline code and to fix bugs.
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:

Review Comment:
   ```suggestion
   The following Apache Beam SDKs support the `BigQueryIO` connector:
   ```



##########
learning/prompts/documentation-lookup/15_advanced_xlang.md:
##########
@@ -0,0 +1,15 @@
+Prompt:
+What is a multi-language pipeline in Apache Beam?
+Response:
+Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a pipeline that 
reads from a Python source processes the data using a Java transform, and 
writes the data to a Python sink is a multi-language pipeline.
+
+For example, the [Apache Kafka 
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
 and [SQL 
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
 from the Java SDK can be used in Python pipelines.

Review Comment:
   ```suggestion
   For example, you can use the [Apache Kafka 
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
 and [SQL 
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
 from the Java SDK in Python pipelines.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:

Review Comment:
   ```suggestion
   To design a pipeline, you need answers to the following questions:
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
+
+At a high level, an SDF is responsible for processing element and restriction 
pairs. A restriction represents a subset of work that would have been necessary 
to have been done when processing the element.
+
+Executing an [Splittable 
DoFn](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 follows the following steps:

Review Comment:
   ```suggestion
   Executing a [Splittable 
`DoFn`](https://beam.apache.org/documentation/programming-guide/#splittable-dofns)
 uses the following steps:
   ```



##########
learning/prompts/documentation-lookup/15_advanced_xlang.md:
##########
@@ -0,0 +1,15 @@
+Prompt:
+What is a multi-language pipeline in Apache Beam?
+Response:
+Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a pipeline that 
reads from a Python source processes the data using a Java transform, and 
writes the data to a Python sink is a multi-language pipeline.
+
+For example, the [Apache Kafka 
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
 and [SQL 
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
 from the Java SDK can be used in Python pipelines.
+
+See quickstart examples for 
[Java](https://beam.apache.org/documentation/sdks/java-multi-language-pipelines)
 and 
[Python](https://beam.apache.org/documentation/sdks/python-multi-language-pipelines)
 to learn how to create a multi-language pipeline.
+
+Depending on the SDK language of the pipeline, you can use a high-level 
SDK-wrapper class, or a low-level transform class to access a cross-language 
transform. See [Using cross-language 
transforms](https://beam.apache.org/documentation/programming-guide/#use-x-lang-transforms)
 section of Apache Beam Documentation.
+
+Developing a cross-language transform involves defining a Uniform Resourse 
Name(URN) for registering the transform with an expansion service. See 
[Defining a 
URN](https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn)
 for additional information and examples.

Review Comment:
   ```suggestion
   To develop a cross-language transform, you need to define a Uniform Resourse 
Name(URN) for registering the transform with an expansion service. For more 
information, see [Defining a 
URN](https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn).
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and via 
[X-language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts)
 via X-language

Review Comment:
   ```suggestion
   * 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts)
 through X Language
   ```



##########
learning/prompts/documentation-lookup/15_advanced_xlang.md:
##########
@@ -0,0 +1,15 @@
+Prompt:
+What is a multi-language pipeline in Apache Beam?
+Response:
+Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a pipeline that 
reads from a Python source processes the data using a Java transform, and 
writes the data to a Python sink is a multi-language pipeline.
+
+For example, the [Apache Kafka 
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
 and [SQL 
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
 from the Java SDK can be used in Python pipelines.
+
+See quickstart examples for 
[Java](https://beam.apache.org/documentation/sdks/java-multi-language-pipelines)
 and 
[Python](https://beam.apache.org/documentation/sdks/python-multi-language-pipelines)
 to learn how to create a multi-language pipeline.

Review Comment:
   ```suggestion
   To learn how to create a multi-language pipeline, see the quickstart 
examples for 
[Java](https://beam.apache.org/documentation/sdks/java-multi-language-pipelines)
 and 
[Python](https://beam.apache.org/documentation/sdks/python-multi-language-pipelines).
   ```



##########
learning/prompts/documentation-lookup/19_io_pubsub.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+Is PubSub supported in Apache Beam?
+Response:
+[PubSub](https://cloud.google.com/pubsub) is a[ Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service for sending and receiving 
messages between independent applications. Apache Beam provides a PubSubIO 
connector that allows you to read and write messages from and to PubSub.
+PubSub is currently supported only in streaming pipelines.
+
+
+PubSub is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsub.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/pubsub.ts)
 via X-language
+
+[Dataflow-cookbook 
repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) will help 
you to get started with PubSub and Apache Beam. See here for 
[read](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/pubsub/read_pubsub_multiple.py)
 and 
[write](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/pubsub/write_pubsub.py)
 examples in Python.

Review Comment:
   ```suggestion
   To get started with Pub/Sub and Apache Beam, see the [Dataflow-cookbook 
repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) in 
GitHub. For Python read examples, see the [read_pubsub_multiple.py
   
](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/pubsub/read_pubsub_multiple.py)
 example. For Python write examples, see the 
[write_pubsub.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/pubsub/write_pubsub.py)
 example.
   ```



##########
learning/prompts/documentation-lookup/15_advanced_xlang.md:
##########
@@ -0,0 +1,15 @@
+Prompt:
+What is a multi-language pipeline in Apache Beam?
+Response:
+Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a pipeline that 
reads from a Python source processes the data using a Java transform, and 
writes the data to a Python sink is a multi-language pipeline.

Review Comment:
   We also have a Go SDK; not sure if it's relevant here.



##########
learning/prompts/documentation-lookup/15_advanced_xlang.md:
##########
@@ -0,0 +1,15 @@
+Prompt:
+What is a multi-language pipeline in Apache Beam?
+Response:
+Beam lets you combine transforms written in any supported SDK language 
(currently, 
[Java](https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms)
 and 
[Python](https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms))
 and use them in one multi-language pipeline. For example, a pipeline that 
reads from a Python source processes the data using a Java transform, and 
writes the data to a Python sink is a multi-language pipeline.
+
+For example, the [Apache Kafka 
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
 and [SQL 
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
 from the Java SDK can be used in Python pipelines.
+
+See quickstart examples for 
[Java](https://beam.apache.org/documentation/sdks/java-multi-language-pipelines)
 and 
[Python](https://beam.apache.org/documentation/sdks/python-multi-language-pipelines)
 to learn how to create a multi-language pipeline.
+
+Depending on the SDK language of the pipeline, you can use a high-level 
SDK-wrapper class, or a low-level transform class to access a cross-language 
transform. See [Using cross-language 
transforms](https://beam.apache.org/documentation/programming-guide/#use-x-lang-transforms)
 section of Apache Beam Documentation.

Review Comment:
   ```suggestion
   Depending on the SDK language of the pipeline, you can use a high-level 
SDK-wrapper class or a low-level transform class to access a cross-language 
transform. For more information, see [Using cross-language 
transforms](https://beam.apache.org/documentation/programming-guide/#use-x-lang-transforms).
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and via 
[X-language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts)
 via X-language
+
+`ReadFromBigQuery` is used to read data from BigQuery. Data can be read from a 
BigQuery table or using a SQL query. The default mode is to return table rows 
read from a BigQuery source as dictionaries. Native `TableRow` objects can also 
be returned if desired.
+
+Reading from BigQuery in its simplest form could be something like:
+
+```python
+from apache_beam.io.gcp.bigquery import ReadFromBigQuery
+
+with beam.Pipeline(options=options) as p:
+  # read from a table
+    lines_table = p | 'Read' >> ReadFromBigQuery(table=table)
+  # read from a query
+    lines_query = p | 'Read' >> ReadFromBigQuery(query="SELECT * FROM table")
+
+```
+Writing to BigQuery in its simplest form could be something like:
+
+```python
+from apache_beam.io.gcp.bigquery import WriteToBigQuery
+
+with beam.Pipeline(options=options) as p:
+  # write to a table
+    p | 'Write' >> beam.io.WriteToBigQuery(
+        table,
+        schema=TABLE_SCHEMA,
+        create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
+        write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
+```
+[Dataflow-cookbook 
repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) will help 
you to get started with BigQuery and Apache Beam. See here for 
[read](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/read_table_bigquery.py)
 and 
[write](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/write_bigquery.py)
 examples in Python.

Review Comment:
   ```suggestion
   To get started with BigQuery and Apache Beam, see the examples in the 
[Dataflow-cookbook 
repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) in 
GitHub. For Python read examples, see the 
[read_table_bigquery.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/read_table_bigquery.py)
 example. For Python write examples, see the 
[write_bigquery.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/write_bigquery.py)
 example.
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and via 
[X-language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts)
 via X-language
+
+`ReadFromBigQuery` is used to read data from BigQuery. Data can be read from a 
BigQuery table or using a SQL query. The default mode is to return table rows 
read from a BigQuery source as dictionaries. Native `TableRow` objects can also 
be returned if desired.
+
+Reading from BigQuery in its simplest form could be something like:
+
+```python
+from apache_beam.io.gcp.bigquery import ReadFromBigQuery
+
+with beam.Pipeline(options=options) as p:
+  # read from a table
+    lines_table = p | 'Read' >> ReadFromBigQuery(table=table)
+  # read from a query
+    lines_query = p | 'Read' >> ReadFromBigQuery(query="SELECT * FROM table")
+
+```
+Writing to BigQuery in its simplest form could be something like:

Review Comment:
   ```suggestion
   Apache Beam pipeline code for writing from BigQuery might look like the 
following example:
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?

Review Comment:
   ```suggestion
   - Where is the data stored?
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and via 
[X-language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts)
 via X-language
+
+`ReadFromBigQuery` is used to read data from BigQuery. Data can be read from a 
BigQuery table or using a SQL query. The default mode is to return table rows 
read from a BigQuery source as dictionaries. Native `TableRow` objects can also 
be returned if desired.
+
+Reading from BigQuery in its simplest form could be something like:

Review Comment:
   ```suggestion
   Apache Beam pipeline code for reading from BigQuery might look like the 
following example:
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and via 
[X-language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts)
 via X-language
+
+`ReadFromBigQuery` is used to read data from BigQuery. Data can be read from a 
BigQuery table or using a SQL query. The default mode is to return table rows 
read from a BigQuery source as dictionaries. Native `TableRow` objects can also 
be returned if desired.

Review Comment:
   ```suggestion
   To read data from BigQuery, use `ReadFromBigQuery`. Apache Beam can read 
data directly from a BigQuery table or using a SQL query. The default mode is 
to return table rows read from a BigQuery source as dictionaries. Built-in 
`TableRow` objects can also be returned.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection
+- Write or otherwise output the final, transformed PCollections

Review Comment:
   ```suggestion
   - Write or otherwise output the final, transformed `PCollection` objects.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object

Review Comment:
   ```suggestion
   - Create a `Pipeline` object.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:

Review Comment:
   ```suggestion
   An Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Apache Beam SDKs, your 
program needs to perform the following steps:
   ```



##########
learning/prompts/documentation-lookup/13_advanced_splittable_dofn.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What is Splittable DoFn in Apache Beam?
+Response:
+Splittable DoFn (SDF) is a generalization of 
[DoFn](https://beam.apache.org/documentation/programming-guide/#pardo) that 
lets you process elements in a non-monolithic way. Splittable DoFn makes 
creating complex, modular I/O connectors in Beam easier.
+When you apply a splittable DoFn to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.

Review Comment:
   ```suggestion
   When you apply a splittable `DoFn` to an element, the runner can split the 
element’s processing into smaller tasks. You can checkpoint the processing of 
an element, and you can split the remaining work to yield additional 
parallelism.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection
+- Write or otherwise output the final, transformed PCollections
+- Run the pipeline

Review Comment:
   ```suggestion
   - Run the pipeline.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data

Review Comment:
   ```suggestion
   - Use a `Read` or `Create` transform to create one or more `PCollection` 
objects for your pipeline data.
   ```



##########
learning/prompts/documentation-lookup/19_io_pubsub.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+Is PubSub supported in Apache Beam?
+Response:
+[PubSub](https://cloud.google.com/pubsub) is a[ Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service for sending and receiving 
messages between independent applications. Apache Beam provides a PubSubIO 
connector that allows you to read and write messages from and to PubSub.
+PubSub is currently supported only in streaming pipelines.
+
+
+PubSub is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsub.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/pubsub.ts)
 via X-language

Review Comment:
   ```suggestion
   * 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/pubsub.ts)
 through X Language.
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection
+- Write or otherwise output the final, transformed PCollections
+- Run the pipeline
+
+Beam documentation has more on 
[developing](https://beam.apache.org/documentation/programming-guide/) and 
[executing](https://beam.apache.org/documentation/pipelines/create-your-pipeline/)
 pipelines.
+
+Testing pipelines is a particularly important step in developing an effective 
data processing solution. The indirect nature of the Beam model, in which your 
user code constructs a pipeline graph to be executed remotely, can make 
debugging-failed runs a non-trivial task. See 
[here](https://beam.apache.org/documentation/pipelines/test-your-pipeline/) for 
more information on pipeline testing strategies.
+
+Choosing a [runner](https://beam.apache.org/documentation/#choosing-a-runner) 
is a crucial step in deploying your pipeline. The runner you choose determines 
where and how your pipeline will execute.

Review Comment:
   ```suggestion
   Choosing a 
[runner](https://beam.apache.org/documentation/#choosing-a-runner) is a crucial 
step in deploying your pipeline. The runner you choose determines where and how 
your pipeline executes.
   ```



##########
learning/prompts/documentation-lookup/17_advanced_ai_ml.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What are AI and ML capabilities in Apache Beam?
+Response:
+Apache Beam has several built-in [AI and ML 
capabilities](https://beam.apache.org/documentation/ml/overview/) that enable 
you to:
+- Process large datasets for both preprocessing and model inference.
+- Conduct exploratory data analysis and smoothly scale up data pipelines in 
production as part of your MLOps ecosystem.
+- Run your models in production with varying data loads, both in batch and 
streaming
+
+See [here](https://beam.apache.org/documentation/patterns/ai-platform/) for 
common AI Platform integration patterns in Apache Beam.

Review Comment:
   ```suggestion
   For common AI platform integration patterns in Apache Beam, see [AI Platform 
integration 
patterns](https://beam.apache.org/documentation/patterns/ai-platform/).
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection
+- Write or otherwise output the final, transformed PCollections
+- Run the pipeline
+
+Beam documentation has more on 
[developing](https://beam.apache.org/documentation/programming-guide/) and 
[executing](https://beam.apache.org/documentation/pipelines/create-your-pipeline/)
 pipelines.
+
+Testing pipelines is a particularly important step in developing an effective 
data processing solution. The indirect nature of the Beam model, in which your 
user code constructs a pipeline graph to be executed remotely, can make 
debugging-failed runs a non-trivial task. See 
[here](https://beam.apache.org/documentation/pipelines/test-your-pipeline/) for 
more information on pipeline testing strategies.

Review Comment:
   ```suggestion
   Testing pipelines is a particularly important step in developing an 
effective data processing solution. The indirect nature of the Beam model, in 
which your user code constructs a pipeline graph to be executed remotely, can 
make debugging-failed runs difficult. For more information about pipeline 
testing strategies, see [Test Your 
Pipeline](https://beam.apache.org/documentation/pipelines/test-your-pipeline/).
   ```



##########
learning/prompts/documentation-lookup/16_advanced_pipeline_lifecycle.md:
##########
@@ -0,0 +1,36 @@
+Prompt:
+What is a pipeline development lifecycle in Apache Beam?
+Response:
+
+The Apache Beam pipeline development lifecycle is an iterative process that 
usually involves the following steps:
+
+- Design your pipeline.
+- Develop your pipeline code.
+- Test your pipeline.
+- Deploy your pipeline.
+
+On each iteration, you may need to go back and forth between the different 
steps to refine your pipeline code and fix any bugs you find.
+
+Designing a pipeline addresses the following questions:
+- Where is my data stored?
+- What does your data look like?
+- What do you want to do with your data?
+- What does your output data look like, and where should it go?
+
+Beam documentation has more information on [pipeline 
design](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) 
and common [pipeline 
patterns](https://beam.apache.org/documentation/patterns/overview/).
+
+
+Apache Beam program expresses a data processing pipeline, from start to 
finish. To construct a pipeline using the classes in the Beam SDKs, your 
program will need to perform the following general steps:
+
+- Create a Pipeline object
+- Use a Read or Create transform to create one or more PCollections for your 
pipeline data
+- Apply transforms to each PCollection
+- Write or otherwise output the final, transformed PCollections
+- Run the pipeline
+
+Beam documentation has more on 
[developing](https://beam.apache.org/documentation/programming-guide/) and 
[executing](https://beam.apache.org/documentation/pipelines/create-your-pipeline/)
 pipelines.
+
+Testing pipelines is a particularly important step in developing an effective 
data processing solution. The indirect nature of the Beam model, in which your 
user code constructs a pipeline graph to be executed remotely, can make 
debugging-failed runs a non-trivial task. See 
[here](https://beam.apache.org/documentation/pipelines/test-your-pipeline/) for 
more information on pipeline testing strategies.
+
+Choosing a [runner](https://beam.apache.org/documentation/#choosing-a-runner) 
is a crucial step in deploying your pipeline. The runner you choose determines 
where and how your pipeline will execute.
+More information on deployment is available 
[here](https://beam.apache.org/documentation/runtime/environments/).

Review Comment:
   ```suggestion
   For more information about pipeline deployment, see [Container 
environments](https://beam.apache.org/documentation/runtime/environments/).
   ```



##########
learning/prompts/documentation-lookup/17_advanced_ai_ml.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What are AI and ML capabilities in Apache Beam?
+Response:
+Apache Beam has several built-in [AI and ML 
capabilities](https://beam.apache.org/documentation/ml/overview/) that enable 
you to:
+- Process large datasets for both preprocessing and model inference.
+- Conduct exploratory data analysis and smoothly scale up data pipelines in 
production as part of your MLOps ecosystem.
+- Run your models in production with varying data loads, both in batch and 
streaming

Review Comment:
   ```suggestion
   - Run your models in production with varying data loads, both in batch and 
streaming pipelines.
   ```



##########
learning/prompts/documentation-lookup/17_advanced_ai_ml.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What are AI and ML capabilities in Apache Beam?
+Response:
+Apache Beam has several built-in [AI and ML 
capabilities](https://beam.apache.org/documentation/ml/overview/) that enable 
you to:
+- Process large datasets for both preprocessing and model inference.
+- Conduct exploratory data analysis and smoothly scale up data pipelines in 
production as part of your MLOps ecosystem.
+- Run your models in production with varying data loads, both in batch and 
streaming
+
+See [here](https://beam.apache.org/documentation/patterns/ai-platform/) for 
common AI Platform integration patterns in Apache Beam.
+
+The recommended way to implement inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). See 
[here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)
 for more details on how to use RunInference for PyTorch, scikit-learn, and 
TensorFlow.
+
+Using pre-trained models in Apache Beam is also supported with 
[PyTorch](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch.ipynb),
 
[Scikit-learn](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_sklearn.ipynb),
 and 
[Tensorflow](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb).
 Running inference on  [custom 
models](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models) 
is also supported.
+
+Apache Beam also supports automatic model refresh, which allows you to update 
models, hot-swapping them in a running streaming pipeline with no pause in 
processing the data stream, avoiding downtime. See 
[here](https://beam.apache.org/documentation/ml/about-ml/#automatic-model-refresh)
 for more details.
+More on Apache Beam ML innovations for production can be found 
[here](https://cloud.google.com/blog/products/ai-machine-learning/dataflow-ml-innovations-on-apache-beam/).

Review Comment:
   ```suggestion
   For more information about using machine learning models with Apache Beam, 
see [Running ML models now easier with new Dataflow ML innovations on Apache 
Beam](https://cloud.google.com/blog/products/ai-machine-learning/dataflow-ml-innovations-on-apache-beam/).
   ```



##########
learning/prompts/documentation-lookup/17_advanced_ai_ml.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What are AI and ML capabilities in Apache Beam?
+Response:
+Apache Beam has several built-in [AI and ML 
capabilities](https://beam.apache.org/documentation/ml/overview/) that enable 
you to:
+- Process large datasets for both preprocessing and model inference.
+- Conduct exploratory data analysis and smoothly scale up data pipelines in 
production as part of your MLOps ecosystem.
+- Run your models in production with varying data loads, both in batch and 
streaming
+
+See [here](https://beam.apache.org/documentation/patterns/ai-platform/) for 
common AI Platform integration patterns in Apache Beam.
+
+The recommended way to implement inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). See 
[here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)
 for more details on how to use RunInference for PyTorch, scikit-learn, and 
TensorFlow.
+
+Using pre-trained models in Apache Beam is also supported with 
[PyTorch](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch.ipynb),
 
[Scikit-learn](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_sklearn.ipynb),
 and 
[Tensorflow](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb).
 Running inference on  [custom 
models](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models) 
is also supported.
+
+Apache Beam also supports automatic model refresh, which allows you to update 
models, hot-swapping them in a running streaming pipeline with no pause in 
processing the data stream, avoiding downtime. See 
[here](https://beam.apache.org/documentation/ml/about-ml/#automatic-model-refresh)
 for more details.
+More on Apache Beam ML innovations for production can be found 
[here](https://cloud.google.com/blog/products/ai-machine-learning/dataflow-ml-innovations-on-apache-beam/).
+
+For more hands-on examples of using Apache Beam ML integration, see 
[here](https://beam.apache.org/documentation/patterns/bqml/)

Review Comment:
   ```suggestion
   For an example that use the Apache Beam ML integration, see [BigQuery ML 
integration](https://beam.apache.org/documentation/patterns/bqml/).
   ```



##########
learning/prompts/documentation-lookup/17_advanced_ai_ml.md:
##########
@@ -0,0 +1,18 @@
+Prompt:
+What are AI and ML capabilities in Apache Beam?
+Response:
+Apache Beam has several built-in [AI and ML 
capabilities](https://beam.apache.org/documentation/ml/overview/) that enable 
you to:
+- Process large datasets for both preprocessing and model inference.
+- Conduct exploratory data analysis and smoothly scale up data pipelines in 
production as part of your MLOps ecosystem.
+- Run your models in production with varying data loads, both in batch and 
streaming
+
+See [here](https://beam.apache.org/documentation/patterns/ai-platform/) for 
common AI Platform integration patterns in Apache Beam.
+
+The recommended way to implement inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). See 
[here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)
 for more details on how to use RunInference for PyTorch, scikit-learn, and 
TensorFlow.
+
+Using pre-trained models in Apache Beam is also supported with 
[PyTorch](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch.ipynb),
 
[Scikit-learn](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_sklearn.ipynb),
 and 
[Tensorflow](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb).
 Running inference on  [custom 
models](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models) 
is also supported.
+
+Apache Beam also supports automatic model refresh, which allows you to update 
models, hot-swapping them in a running streaming pipeline with no pause in 
processing the data stream, avoiding downtime. See 
[here](https://beam.apache.org/documentation/ml/about-ml/#automatic-model-refresh)
 for more details.

Review Comment:
   ```suggestion
   Apache Beam also supports automatically updating the model being used with 
the `RunInference PTransform` in streaming pipelines without stopping the 
pipeline. The feature lets you avoid downtime downtime. For more information, 
see [Automatic model 
refresh](https://beam.apache.org/documentation/ml/about-ml/#automatic-model-refresh).
   ```



##########
learning/prompts/documentation-lookup/19_io_pubsub.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+Is PubSub supported in Apache Beam?
+Response:
+[PubSub](https://cloud.google.com/pubsub) is a[ Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service for sending and receiving 
messages between independent applications. Apache Beam provides a PubSubIO 
connector that allows you to read and write messages from and to PubSub.
+PubSub is currently supported only in streaming pipelines.

Review Comment:
   ```suggestion
   Pub/Sub is integrations are supported by streaming pipelines.
   ```



##########
learning/prompts/documentation-lookup/19_io_pubsub.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+Is PubSub supported in Apache Beam?
+Response:
+[PubSub](https://cloud.google.com/pubsub) is a[ Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service for sending and receiving 
messages between independent applications. Apache Beam provides a PubSubIO 
connector that allows you to read and write messages from and to PubSub.
+PubSub is currently supported only in streaming pipelines.
+
+
+PubSub is supported in the following Beam SDKs:

Review Comment:
   ```suggestion
   The following Apache Beam SDKs support Pub/Sub:
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.
+
+BigQueryIO is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and via 
[X-language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)

Review Comment:
   ```suggestion
   * 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio)
 native and though [X 
Language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio)
   ```



##########
learning/prompts/documentation-lookup/20_io_biguery.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+Response:
+[BigQuery](https://cloud.google.com/bigquery) is a[ Google 
Cloud](https://cloud.google.com/) serverless and cost-effective enterprise data 
warehouse. Apache Beam provides a BigQueryIO connector to read and write data 
from and to BigQuery. BigQueryIO supports both batch and streaming pipelines.

Review Comment:
   ```suggestion
   Yes, Apache Beam supports BigQuery. 
[BigQuery](https://cloud.google.com/bigquery) is a serverless and 
cost-effective enterprise data warehouse offered by [Google 
Cloud](https://cloud.google.com/). Apache Beam provides a `BigQueryIO` 
connector to read and write data to and from BigQuery. The `BigQueryIO` 
connector supports both batch and streaming pipelines.
   ```



##########
learning/prompts/documentation-lookup/19_io_pubsub.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+Is PubSub supported in Apache Beam?
+Response:
+[PubSub](https://cloud.google.com/pubsub) is a[ Google 
Cloud](https://cloud.google.com/) service that provides a simple, reliable, 
scalable, and secure real-time messaging service for sending and receiving 
messages between independent applications. Apache Beam provides a PubSubIO 
connector that allows you to read and write messages from and to PubSub.
+PubSub is currently supported only in streaming pipelines.
+
+
+PubSub is supported in the following Beam SDKs:
+* 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html)
+* 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsub.html)
+* 
[Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio)
+* 
[Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/pubsub.ts)
 via X-language
+
+[Dataflow-cookbook 
repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) will help 
you to get started with PubSub and Apache Beam. See here for 
[read](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/pubsub/read_pubsub_multiple.py)
 and 
[write](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/pubsub/write_pubsub.py)
 examples in Python.
+
+```python

Review Comment:
   We should introduce this code with a description of what it is and why it's 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DuetAI knowledge lookup prompts [beam]

Reply via email to