rszper commented on code in PR #29507: URL: https://github.com/apache/beam/pull/29507#discussion_r1412494324
########## learning/prompts/documentation-lookup/04_basic_pcollections.md: ########## @@ -0,0 +1,25 @@ +Prompt: +What is a PCollection in Apache Beam? +Response: + +A `PCollection` in Apache Beam is an an unordered bag of elements. Each `PCollection` is a potentially distributed, homogeneous data set or data stream, and is owned by the specific `Pipeline` object for which it is created. It’s the primary data structure used in Apache Beam pipelines to handle large-scale data processing in batch and streaming modes. + +Example below shows how to create a `PCollection` using the `Create` transform: Review Comment: ```suggestion The following example shows how to create a `PCollection` using the `Create` transform: ``` ########## learning/prompts/documentation-lookup/05_basic_ptransforms.md: ########## @@ -0,0 +1,34 @@ +Prompt: +What is a PTransform in Apache Beam? +Response: + +A [`PTransform`](https://beam.apache.org/documentation/programming-guide/#transforms) (or transform) represents a data processing operation, or a step, in a Beam pipeline. A transform is applied to zero or more `PCollection` objects and produces zero or more `PCollection` objects. + +Transforms have the following key characteristics: +1. Versatility: Able to execute a diverse range of operations on `PCollection` objects. +2. Composability: Can be combined to form elaborate data processing pipelines. +3. Parallel execution: Designed for distributed processing, allowing simultaneous execution across multiple workers. +4. Scalability: Able to handle extensive data and suitable for both batch and streaming data. + +The Beam SDKs contain different transforms that you can apply to your pipeline’s PCollections. The following list includes common transform types: + - [Source transforms](https://beam.apache.org/documentation/programming-guide/#pipeline-io) such as `TextIO.Read` and `Create`. A source transform conceptually has no input. + - [Processing and conversion operations](https://beam.apache.org/documentation/programming-guide/#core-beam-transforms) such as `ParDo`, `GroupByKey`, `CoGroupByKey`, `Combine`, and `Count`. + - [Outputting transforms](https://beam.apache.org/documentation/programming-guide/#pipeline-io) such as `TextIO.Write`. + - User-defined, application-specific [composite transforms](https://beam.apache.org/documentation/programming-guide/#composite-transforms). + +Transform processing logic is provided in the form of a function object, colloquially referred to as “user code.” This code is applied to each element of the input `PCollection` (or more than one `PCollection`). The `PCollection` objects can be linked together to create complex data processing sequences. +User code for transforms must satisfy the [requirements of the Beam model](https://beam.apache.org/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms). + +Example below shows how to apply custom user code to a `PCollection` using the `ParDo` transform: Review Comment: ```suggestion The following example shows how to apply custom user code to a `PCollection` using the `ParDo` transform: ``` ########## learning/prompts/documentation-lookup/05_basic_ptransforms.md: ########## @@ -0,0 +1,34 @@ +Prompt: +What is a PTransform in Apache Beam? +Response: + +A [`PTransform`](https://beam.apache.org/documentation/programming-guide/#transforms) (or transform) represents a data processing operation, or a step, in a Beam pipeline. A transform is applied to zero or more `PCollection` objects and produces zero or more `PCollection` objects. + +Transforms have the following key characteristics: +1. Versatility: Able to execute a diverse range of operations on `PCollection` objects. +2. Composability: Can be combined to form elaborate data processing pipelines. +3. Parallel execution: Designed for distributed processing, allowing simultaneous execution across multiple workers. +4. Scalability: Able to handle extensive data and suitable for both batch and streaming data. + +The Beam SDKs contain different transforms that you can apply to your pipeline’s PCollections. The following list includes common transform types: Review Comment: ```suggestion The Beam SDKs contain different transforms that you can apply to your pipeline’s `PCollection` objects. The following list includes common transform types: ``` ########## learning/prompts/documentation-lookup/05_basic_ptransforms.md: ########## @@ -0,0 +1,34 @@ +Prompt: +What is a PTransform in Apache Beam? +Response: + +A [`PTransform`](https://beam.apache.org/documentation/programming-guide/#transforms) (or transform) represents a data processing operation, or a step, in a Beam pipeline. A transform is applied to zero or more `PCollection` objects and produces zero or more `PCollection` objects. + +Transforms have the following key characteristics: +1. Versatility: Able to execute a diverse range of operations on `PCollection` objects. +2. Composability: Can be combined to form elaborate data processing pipelines. +3. Parallel execution: Designed for distributed processing, allowing simultaneous execution across multiple workers. +4. Scalability: Able to handle extensive data and suitable for both batch and streaming data. + +The Beam SDKs contain different transforms that you can apply to your pipeline’s PCollections. The following list includes common transform types: Review Comment: ```suggestion The Beam SDKs contain different transforms that you can apply to your pipeline’s `PCollection` objects. The following list includes common transform types: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
