Polber commented on code in PR #29077:
URL: https://github.com/apache/beam/pull/29077#discussion_r1367218090


##########
sdks/python/apache_beam/yaml/yaml_mapping.md:
##########
@@ -200,3 +200,40 @@ criteria. This can be accomplished with a `Filter` 
transform, e.g.
     language: sql
     keep: "col2 > 0"
 ```
+
+## Types
+
+Beam will try to infer the types involved in the mappings, but sometimes this
+is not possible. In these cases one can explicitly denote the expected output
+type, e.g.
+
+```
+- type: MapToFields
+  config:
+    language: python
+    fields:
+      new_col:
+        expression: "col1.upper()"
+        type: string
+```
+
+The expected type is given in json schema notation, with the addition that
+a top-level basic types may be given as a literal string rather than requiring
+a `{type: 'basic_type_name'}` nesting.
+
+```
+- type: MapToFields
+  config:
+    language: python
+    fields:
+      new_col:
+        expression: "col1.upper()"
+        type: string
+      another_col:
+        expression: "beam.Row(a=col1, b=[col2])"
+        type:
+          type: 'object'
+          properties:
+            a: {type: 'string'}
+            b: {type: 'array', items: {type: 'number'}}

Review Comment:
   I like the idea of keeping the type properties as nested YAML rather than 
JSON
   ```suggestion
         another_col:
           expression: "beam.Row(a=col1, b=[col2])"
           type:
             type: 'object'
             properties:
               a: 
                 type: 'string'
               b: 
                 type: 'array'
                 items: 
                   type: 'number'
   ```
   I also think naming the outer type tag as `output_type` or something similar 
could help make the start of the type config more explicit, but this might be 
too verbose. Also, why is the type `'object'`? Why not `'row'`?
   ```suggestion
         another_col:
           expression: "beam.Row(a=col1, b=[col2])"
           output_type:
             type: 'object'
             properties:
               ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to