olehborysevych commented on code in PR #30440:
URL: https://github.com/apache/beam/pull/30440#discussion_r1507648707


##########
learning/prompts/documentation-lookup-nolinks/50_beam_yaml.md:
##########
@@ -0,0 +1,65 @@
+Prompt: Can I create Apache Beam pipelines using YAML?

Review Comment:
   add line break



##########
learning/prompts/documentation-lookup-nolinks/50_beam_yaml.md:
##########
@@ -0,0 +1,65 @@
+Prompt: Can I create Apache Beam pipelines using YAML?
+
+Response: The Beam YAML API enables the creation of basic Apache Beam 
pipelines using YAML syntax, which can be edited with any text editor. This 
intuitive, declarative syntax simplifies pipeline authoring by eliminating the 
need for coding experience or SDK familiarity. Manipulating and understanding 
YAML semantics is easier compared to working directly with Beam protos. 
Additionally, YAML's readability and ease of authorship make it a suitable 
intermediate representation for tools such as pipeline authoring GUIs or 
lineage analysis tools.
+
+The Beam YAML parser is currently integrated into the Apache Beam Python SDK.
+
+The Beam YAML SDK offers a wide range of built-in transforms and allows the 
creation of custom transforms through the notion of `providers` that leverage 
expansion services and schema transforms, or by integrating Python packages 
into your pipeline. You can give names to transforms to help you monitor, 
debug, and disambiguate identical transforms, enhancing readability.
+
+Beam YAML enables authoring both linear and non-linear pipelines. For linear 
pipelines, Beam YAML allows the inputs to be implicit by specifying the 
pipeline as a `chain` type. Additionally, you can denote the first and last 
transforms as `source` and `sink`, respectively. For non-linear pipelines, 
inputs must be explicitly named, although `chains` can be nested within them.
+
+The following Beam YAML code snippet demonstrates a linear pipeline that reads 
data from a CSV file, filters rows based on a specified condition, executes an 
SQL query, and writes the resulting data to a JSON file:
+
+```yaml
+pipeline:
+  type: chain
+
+  source:
+    type: ReadFromCsv
+    config:
+      path: /path/to/input*.csv
+
+  transforms:
+    - type: Filter
+      config:
+        language: python
+        keep: "col3 > 100"
+
+    - type: Sql
+      name: MySqlTransform
+      config:
+        query: "select col1, count(*) as cnt from PCOLLECTION group by col1"
+
+  sink:
+    type: WriteToJson
+    config:
+      path: /path/to/output.json
+```
+
+With Beam YAML, you can define both streaming and batch pipelines. Beam YAML 
also supports Apache Beam’s windowing and triggering, allowing windowing to be 
declared using the standard `WindowInto` transform or by tagging a transform 
with specified windowing. The latter applies the windowing to its inputs and 
the transform itself.
+
+The following example demonstrates a streaming Beam YAML pipeline with a chain 
of transforms. It reads from a Pub/Sub topic, applies a grouping transform with 
sliding windowing, and then writes the output to another Pub/Sub topic:
+
+```yaml
+pipeline:
+  type: chain
+  transforms:
+    - type: ReadFromPubSub
+      config:
+        topic: myPubSubTopic
+        format: ...
+        schema: ...
+    - type: SomeGroupingTransform
+      config:
+        arg: ...
+      windowing:
+        type: sliding
+        size: 60s
+        period: 10s
+    - type: WriteToPubSub
+      config:
+        topic: anotherPubSubTopic
+        format: json
+```
+
+You can execute a pipeline defined in a YAML file using the standard `python 
-m` command.

Review Comment:
   please add how you specify a yaml file for execution



##########
learning/prompts/documentation-lookup-nolinks/50_beam_yaml.md:
##########
@@ -0,0 +1,65 @@
+Prompt: Can I create Apache Beam pipelines using YAML?
+
+Response: The Beam YAML API enables the creation of basic Apache Beam 
pipelines using YAML syntax, which can be edited with any text editor. This 
intuitive, declarative syntax simplifies pipeline authoring by eliminating the 
need for coding experience or SDK familiarity. Manipulating and understanding 
YAML semantics is easier compared to working directly with Beam protos. 
Additionally, YAML's readability and ease of authorship make it a suitable 
intermediate representation for tools such as pipeline authoring GUIs or 
lineage analysis tools.

Review Comment:
   add line break



##########
learning/prompts/documentation-lookup-nolinks/50_beam_yaml.md:
##########
@@ -0,0 +1,65 @@
+Prompt: Can I create Apache Beam pipelines using YAML?
+
+Response: The Beam YAML API enables the creation of basic Apache Beam 
pipelines using YAML syntax, which can be edited with any text editor. This 
intuitive, declarative syntax simplifies pipeline authoring by eliminating the 
need for coding experience or SDK familiarity. Manipulating and understanding 
YAML semantics is easier compared to working directly with Beam protos. 
Additionally, YAML's readability and ease of authorship make it a suitable 
intermediate representation for tools such as pipeline authoring GUIs or 
lineage analysis tools.

Review Comment:
   i suggest to replace word syntax with file, because it's not that syntax can 
be edited but a file or a document



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to