Re: [PR] Fix broken regex for allowed_deserialization_classes [airflow]

via GitHub Mon, 11 Dec 2023 08:47:35 -0800


potiuk commented on code in PR #36147:
URL: https://github.com/apache/airflow/pull/36147#discussion_r1422794159



##########
airflow/serialization/serde.py:
##########
@@ -360,7 +360,7 @@ def _register():
 @functools.lru_cache(maxsize=None)
 def _get_patterns() -> list[Pattern]:
     patterns = conf.get("core", "allowed_deserialization_classes").split()
-    return [re2.compile(re2.sub(r"(\w)\.", r"\1\..", p)) for p in patterns]
+    return [re2.compile(re2.sub(r"(\w)\.?\*", r"\1\.", p)) for p in patterns]

Review Comment:
   I still find it quite a bit mind boggling and canntot sey that it is wrong 
or right. If we want to keep the flexibility there. I think we should split it 
into two different checks.
   
   1) Check assuming that we are using glob
   
   If not matched:
   
   2) Another check assuming that we are using regexp.
   
   This would be much easier to reason about it an we could also find a 
reasonable way how to explain it to users - because currently it's next to 
impossible to explain this behaviour to the user. 
   
   As the old saying is "you have problems, introduce regexp - now you have two 
problems". In this case I feeel with two regexpe we exponentially grow 
complexity of what happens here.
   
   I think we could explain to the users easily that you can either use glob or 
regexp and both will match. There is a small risk of matching something that we 
did not expect but I think it is very unlikely.
   
   we could also add a config or prefix to make it more deterministic (say 
`deserialization_classes_match`: [regexp, glob, both]` with both being default.
   
   @bolkedebruin  WDYT ? I find it very difficult to reason and confirm if the 
proposed regexp works as expected.
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix broken regex for allowed_deserialization_classes [airflow]

Reply via email to