olehborysevych commented on code in PR #29507: URL: https://github.com/apache/beam/pull/29507#discussion_r1411018332
########## learning/prompts/documentation-lookup/02_basic_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +What is a Pipeline in Apache Beam? +Response: +A [Pipeline](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) in Apache Beam serves as an abstraction that encapsulates the entirety of a data processing task, including all the data and each step of the process. Essentially, it's a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of transformations (known as `PTransform`s) applied to collections of data (`PCollection`s). + +The simplest pipelines in Apache Beam follow a linear flow of operations, typically adhering to a read-process-write pattern. However, pipelines can also be significantly more complex, featuring multiple input sources, multiple output sinks, and operations (`PTransform`s) that can both read from and output to multiple `PCollection`s. + +For more information on pipeline design and best practices, see the [Common Pipeline Patterns](https://beam.apache.org/documentation/patterns/overview/) + +To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline` (typically in the `main()` function). Review Comment: Done ########## learning/prompts/documentation-lookup/02_basic_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +What is a Pipeline in Apache Beam? +Response: +A [Pipeline](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) in Apache Beam serves as an abstraction that encapsulates the entirety of a data processing task, including all the data and each step of the process. Essentially, it's a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of transformations (known as `PTransform`s) applied to collections of data (`PCollection`s). + +The simplest pipelines in Apache Beam follow a linear flow of operations, typically adhering to a read-process-write pattern. However, pipelines can also be significantly more complex, featuring multiple input sources, multiple output sinks, and operations (`PTransform`s) that can both read from and output to multiple `PCollection`s. + +For more information on pipeline design and best practices, see the [Common Pipeline Patterns](https://beam.apache.org/documentation/patterns/overview/) + +To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline` (typically in the `main()` function). + +```python +import apache_beam as beam + +with beam.Pipeline() as pipeline: + pass # build your pipeline here +``` + +When you create your `Pipeline`, you’ll also need to set some [configuration options](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options). You can set your pipeline’s configuration options programmatically, but it’s often easier to set the options ahead of time (or read them from the command line) and pass them to the `Pipeline` object when you create the object. Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
