This is an automated email from the ASF dual-hosted git repository.

shunping pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 0318a2fa701 add yaml agent development skill (#38382)
0318a2fa701 is described below

commit 0318a2fa701281314ae7b1118a1529a06cc720de
Author: Derrick Williams <[email protected]>
AuthorDate: Tue May 19 09:29:23 2026 -0400

    add yaml agent development skill (#38382)
---
 .agent/skills/yaml-development/SKILL.md | 107 ++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/.agent/skills/yaml-development/SKILL.md 
b/.agent/skills/yaml-development/SKILL.md
new file mode 100644
index 00000000000..c71f91cae00
--- /dev/null
+++ b/.agent/skills/yaml-development/SKILL.md
@@ -0,0 +1,107 @@
+---
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: yaml-development
+description: Guides YAML SDK development in Apache Beam, including environment 
setup, testing, and key concepts. Use when working with Beam YAML code in 
sdks/python/apache_beam/yaml/.
+---
+
+# YAML Development in Apache Beam
+
+## Project Structure
+
+### Key Files in `sdks/python/apache_beam/yaml/`
+- `integration_tests.py` - Runs integration tests defined in YAML files or 
using testcontainers.
+- `main.py` - Entry point for running YAML pipelines from the command line.
+- `pipeline.schema.yaml` - JSON schema defining the valid structure for Beam 
YAML pipelines.
+- `standard_io.yaml` - Declarations of standard IO transforms and their 
mappings to providers.
+- `standard_providers.yaml` - Configuration for standard providers (e.g., Java 
expansion services).
+- `yaml_combine.py` - Implementations for aggregation and combining operations.
+- `yaml_io.py` - Mappings and logic for IO transforms (e.g., PubSub, BigQuery, 
Iceberg).
+- `yaml_join.py` - Implementations for join operations.
+- `yaml_mapping.py` - Implementations for mapping operations (e.g., 
`MapToFields`).
+- `yaml_provider.py` - Manages providers (Python, Java cross-language) that 
implement transforms.
+- `yaml_transform.py` - Core YAML expansion logic, parsing, and translation to 
Beam pipelines.
+
+## Environment Setup
+Since Beam YAML is implemented within the Python SDK, the environment setup is 
identical to Python development. Refer to the `python-development` skill for 
details on using `pyenv` and installing in editable mode (e.g., use `pip 
install -e sdks/python[gcp,test]` from the root directory).
+
+## Running YAML Pipelines
+
+You can run Beam YAML pipelines using the `main.py` script in the YAML 
directory.
+
+### Using `main.py` directly
+```bash
+python -m apache_beam.yaml.main --yaml_pipeline_file=/path/to/pipeline.yaml 
[pipeline_options]
+```
+
+### Example: Running locally
+```bash
+python -m apache_beam.yaml.main \
+  
--yaml_pipeline_file=sdks/python/apache_beam/yaml/examples/simple_filter.yaml \
+  --runner=DirectRunner
+```
+
+### Example: Running on Dataflow
+```bash
+python -m apache_beam.yaml.main \
+  
--yaml_pipeline_file=sdks/python/apache_beam/yaml/examples/simple_filter.yaml \
+  --runner=DataflowRunner \
+  --project=my-project \
+  --region=us-central1 \
+  --temp_location=gs://my-bucket/temp
+```
+
+## Running Tests
+
+### Unit Tests
+Beam YAML has extensive unit tests covering parsing, expansion, and specific 
transforms.
+```bash
+# Run all tests in a file
+pytest sdks/python/apache_beam/yaml/yaml_transform_test.py
+
+# Run a specific test
+pytest 
sdks/python/apache_beam/yaml/yaml_transform_test.py::YamlTransformTest::test_simple_pipeline
+```
+
+### Integration Tests
+Integration tests often spin up Docker containers (via `testcontainers`) for 
external services like MongoDB, Kafka, or databases.
+```bash
+# Run integration tests matching a specific keyword (e.g., mongodb)
+pytest sdks/python/apache_beam/yaml/integration_tests.py -k mongodb
+```
+
+## Key Concepts
+
+### Providers
+Beam YAML uses "providers" to find implementations for transforms requested in 
the YAML file.
+- **Inline/Python Providers**: Leverage Python functions or PTransforms 
directly.
+- **Java/External Providers**: Use Beam's cross-language capabilities to 
invoke Java transforms via an expansion service.
+
+### Preprocessing
+Before execution, a YAML pipeline is preprocessed to resolve schemas, match 
transforms to providers, and expand shorthand notations (like `chain` or 
`source`/`sink` composites).
+
+## Common Issues
+
+### Cross-language Failures
+If a test requires a Java transform, ensure that:
+1. Docker is running (if using testcontainers).
+2. The correct expansion service is available or can be started.
+3. Java environment is correctly configured (sometimes requires specific Java 
versions like Java 17/21).
+
+### Schema Mismatches
+YAML relies heavily on Beam schemas. Ensure that fields produced by a 
transform match the fields expected by the next transform. Use explicit mapping 
if necessary.

Reply via email to