potiuk commented on code in PR #36147:
URL: https://github.com/apache/airflow/pull/36147#discussion_r1422794159
##########
airflow/serialization/serde.py:
##########
@@ -360,7 +360,7 @@ def _register():
@functools.lru_cache(maxsize=None)
def _get_patterns() -> list[Pattern]:
patterns = conf.get("core", "allowed_deserialization_classes").split()
- return [re2.compile(re2.sub(r"(\w)\.", r"\1\..", p)) for p in patterns]
+ return [re2.compile(re2.sub(r"(\w)\.?\*", r"\1\.", p)) for p in patterns]
Review Comment:
I still find it quite a bit mind boggling and canntot sey that it is wrong
or right. If we want to keep the flexibility there. I think we should split it
into two different checks.
1) Check assuming that we are using glob
If not matched:
2) Another check assuming that we are using regexp.
This would be much easier to reason about it an we could also find a
reasonable way how to explain it to users - because currently it's next to
impossible to explain this behaviour to the user.
As the old saying is "you have problems, introduce regexp - now you have two
problems". In this case I feeel with two regexpe we exponentially grow
complexity of what happens here.
I think we could explain to the users easily that you can either use glob or
regexp and both will match. There is a small risk of matching something that we
did not expect but I think it is very unlikely.
we could also add a config or prefix to make it more deterministic (say
`deserialization_classes_match`: [regexp, glob, both]` with both being default.
@bolkedebruin WDYT ? I find it very difficult to reason and confirm if the
proposed regexp works as expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]