kaxil commented on a change in pull request #19267:
URL: https://github.com/apache/airflow/pull/19267#discussion_r742463759



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -325,7 +324,7 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately 
there is no support for r
         elif isinstance(var, TaskGroup):
             return SerializedTaskGroup.serialize_task_group(var)
         elif isinstance(var, Param):
-            return cls._encode(var.dump(), type_=DAT.PARAM)
+            return cls._encode(cls._serialize_param(var), type_=DAT.PARAM)

Review comment:
       The following code runs jsonschema validation if one is supplied.
   
   
https://github.com/apache/airflow/blob/02b7e2c092902bf42791a5ef1a70bc71226f1e32/airflow/models/param.py#L47-L51
   
   Example test:
   
   
https://github.com/apache/airflow/blob/02b7e2c092902bf42791a5ef1a70bc71226f1e32/tests/models/test_param.py#L58-L66
   
   This won't be possible with "set" is what I mean:
   
   i.e. the following works:
   
   ```python
       def test_set_param(self):
           p = Param({'a', 'b'})
           assert p.resolve() == {'a', 'b'}
   ```
   
   but there is no way of adding validation like the set only contains <10 
elements, example:
   
   ```python
       def test_set_param(self):
           p = Param({'a', 'b'}, type='set', maxItems=1)
           assert p.resolve() == {'a', 'b'}
   ```
   
   **It is not a big deal but worth pointing that out in the docs**

##########
File path: .pre-commit-config.yaml
##########
@@ -195,6 +195,15 @@ repos:
           - "4"
         files: ^chart/values\.schema\.json$|^chart/values_schema\.schema\.json$
         pass_filenames: true
+      - id: pretty-format-json
+        name: Serialization schema
+        args:
+          - --autofix
+          - --no-sort-keys
+          - --indent
+          - "2"

Review comment:
       I remember why we hadn't included this file yet in above pre-commit, we 
formatted in a way that was a bit easy to read.
   
   Example:
   
   
![image](https://user-images.githubusercontent.com/8811558/139959132-c4af66cf-c4d4-469a-b060-a75b0e4c1d08.png)
   
   vs
   
   
![image](https://user-images.githubusercontent.com/8811558/139959156-c8b96447-8890-4b04-bfaf-25a3e35ef2c3.png)
   
   
   However no strong preference 

##########
File path: .pre-commit-config.yaml
##########
@@ -195,6 +195,15 @@ repos:
           - "4"
         files: ^chart/values\.schema\.json$|^chart/values_schema\.schema\.json$
         pass_filenames: true
+      - id: pretty-format-json
+        name: Serialization schema
+        args:
+          - --autofix
+          - --no-sort-keys
+          - --indent
+          - "2"

Review comment:
       Why is this separate than the above pre-commit and why `2` indent 
instead of `4`

##########
File path: .pre-commit-config.yaml
##########
@@ -195,6 +195,15 @@ repos:
           - "4"
         files: ^chart/values\.schema\.json$|^chart/values_schema\.schema\.json$
         pass_filenames: true
+      - id: pretty-format-json
+        name: Serialization schema
+        args:
+          - --autofix
+          - --no-sort-keys
+          - --indent
+          - "2"

Review comment:
       In any case, let's separate the formatting and pre-commit change to a 
different PR than the bugfix please

##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -325,7 +324,7 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately 
there is no support for r
         elif isinstance(var, TaskGroup):
             return SerializedTaskGroup.serialize_task_group(var)
         elif isinstance(var, Param):
-            return cls._encode(var.dump(), type_=DAT.PARAM)
+            return cls._encode(cls._serialize_param(var), type_=DAT.PARAM)

Review comment:
       Thinking about this more: @msumit  do you think we will be able to 
validate `set` or non-JSON structures?

##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -325,7 +324,7 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately 
there is no support for r
         elif isinstance(var, TaskGroup):
             return SerializedTaskGroup.serialize_task_group(var)
         elif isinstance(var, Param):
-            return cls._encode(var.dump(), type_=DAT.PARAM)
+            return cls._encode(cls._serialize_param(var), type_=DAT.PARAM)

Review comment:
       Thinking about this more: @msumit  do you think we will be able to 
validate `set` or non-JSON structures using the JSON Schema validation?

##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -325,7 +324,7 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately 
there is no support for r
         elif isinstance(var, TaskGroup):
             return SerializedTaskGroup.serialize_task_group(var)
         elif isinstance(var, Param):
-            return cls._encode(var.dump(), type_=DAT.PARAM)
+            return cls._encode(cls._serialize_param(var), type_=DAT.PARAM)

Review comment:
       The following code runs jsonschema validation if one is supplied.
   
   
https://github.com/apache/airflow/blob/02b7e2c092902bf42791a5ef1a70bc71226f1e32/airflow/models/param.py#L47-L51
   
   Example test:
   
   
https://github.com/apache/airflow/blob/02b7e2c092902bf42791a5ef1a70bc71226f1e32/tests/models/test_param.py#L58-L66
   
   This won't be possible with "set" is what I mean:
   
   i.e. the following works:
   
   ```python
       def test_set_param(self):
           p = Param({'a', 'b'})
           assert p.resolve() == {'a', 'b'}
   ```
   
   but there is no way of adding validation like the set only contains <10 
elements, example:
   
   ```python
       def test_set_param(self):
           p = Param({'a', 'b'}, type='set', maxItems=1)
           assert p.resolve() == {'a', 'b'}
   ```
   
   **It is not a big deal but worth pointing that out in the docs**

##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -325,7 +324,7 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately 
there is no support for r
         elif isinstance(var, TaskGroup):
             return SerializedTaskGroup.serialize_task_group(var)
         elif isinstance(var, Param):
-            return cls._encode(var.dump(), type_=DAT.PARAM)
+            return cls._encode(cls._serialize_param(var), type_=DAT.PARAM)

Review comment:
       The following code runs jsonschema validation if one is supplied.
   
   
https://github.com/apache/airflow/blob/02b7e2c092902bf42791a5ef1a70bc71226f1e32/airflow/models/param.py#L47-L51
   
   Example test:
   
   
https://github.com/apache/airflow/blob/02b7e2c092902bf42791a5ef1a70bc71226f1e32/tests/models/test_param.py#L58-L66
   
   This won't be possible with "set" is what I mean:
   
   i.e. the following works:
   
   ```python
       def test_set_param(self):
           p = Param({'a', 'b'})
           assert p.resolve() == {'a', 'b'}
   ```
   
   but there is no way of adding validation like the set only contains <10 
elements, example:
   
   ```python
       def test_set_param(self):
           p = Param({'a', 'b'}, type='set', maxItems=1)
           assert p.resolve() == {'a', 'b'}
   ```
   
   **It is not a big deal but worth pointing that out in the docs** but that's 
a separate issue and can be in a separate PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to