ryanmadden-google opened a new issue, #34646:
URL: https://github.com/apache/beam/issues/34646

   ### What happened?
   
   The [Beam YAML provider 
docs](https://beam.apache.org/documentation/sdks/yaml-providers/#yaml) show the 
following example provider configuration:
   
   ```
   - type: yaml
     transforms:
       # Define the first transform of type "RaiseElementToPower"
       RaiseElementToPower:
         config_schema:
           properties:
             n: {type: integer}
         body:
           type: MapToFields
           config:
             language: python
             append: true
             fields:
               power: "element ** {{n}}"
   
       # Define a second transform that produces consecutive integers.
       Range:
         config_schema:
           properties:
             end: {type: integer}
         # Setting this parameter lets this transform type be used as a source.
         requires_inputs: false
         body: |
           type: Create
           config:
             elements:
               {% for ix in range(end) %}
               - {{ix}}
               {% endfor %}
   ```
   
   and indicate the following use of the provided transforms:
   
   ```
   transforms:
     - type: Range
       config:
         end: 10
     - type: RaiseElementToPower
       input: Range
       config:
         n: 3
     ...
   ```
   
   However, providing and using `RaiseElementToPower` in this manner results in 
an error. For example, if the provider stanza is in `provider.yaml` and the 
following pipeline is run:
   
   ```
   pipeline:
     type: chain
     transforms:
       - type: Range
         config:
           end: 4
       - type: RaiseElementToPower
         config:
           n: 2
       - type: LogForTesting
   providers:
     - include: provider.yaml
   ```
   
   Then the following error occurs: `ValueError: Invalid transform 
specification at "RaiseElementToPower" at line 7: Missing inputs for transform 
at "MapToFields" at line 1`
   
   This error also occurs when the pipeline does not use `type: chain` and 
instead specifies each transform's inputs. 
   
   The format of the provider `body` definition seems to be the source of the 
issue. Related [tests in the 
source](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/yaml_provider_unit_test.py)
 do not exercise this style of provider definition and only cover the block 
string literal style used in `Range` above and the 'chain' style. For example:
   
   ```
   ...
             body:
               type: chain
               transforms:
                 - type: MapToFields
                   config:
                     language: python
                     append: true
                     fields:
                       power: "element**{{n}}"
   ```
   
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [x] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to