olehborysevych commented on code in PR #29507: URL: https://github.com/apache/beam/pull/29507#discussion_r1411145342
########## learning/prompts/documentation-lookup/02_basic_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +What is a Pipeline in Apache Beam? +Response: +A [Pipeline](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) in Apache Beam serves as an abstraction that encapsulates the entirety of a data processing task, including all the data and each step of the process. Essentially, it's a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of transformations (known as `PTransform`s) applied to collections of data (`PCollection`s). + +The simplest pipelines in Apache Beam follow a linear flow of operations, typically adhering to a read-process-write pattern. However, pipelines can also be significantly more complex, featuring multiple input sources, multiple output sinks, and operations (`PTransform`s) that can both read from and output to multiple `PCollection`s. + +For more information on pipeline design and best practices, see the [Common Pipeline Patterns](https://beam.apache.org/documentation/patterns/overview/) + +To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline` (typically in the `main()` function). Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
