barrysteyn opened a new issue, #34093:
URL: https://github.com/apache/airflow/issues/34093

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   In #28829, the `allowed_deserialization_classes` regex was made more 
intuitive. It's aim (I guess) was to ensure that a period `.` be treated as a 
period and not as the wildcard regex character. The justification (given in 
#28829) was the following:
   
   > Typically someone would like to allow any classes below 'mymodule' to 
match. For example, 'mymodule.dataclasses' by setting 
allowed_deserialization_classes to 'mymodule.*'. However this matches 
everything starting with mymodule, so also mymodulemalicious.'
   
   To correct the above, the following substitution was provided to each space 
separated string provided by `allowed_deserialization_classes`:
   
   ```
   re.sub(r"(\w)\.", r"\1\..", p)
   ```
   
   Assume I have a class that I want to be serialized called 
`this.is.an.example`. Here is the results of the following inputs to 
`allowed_deserialization_classes`
   
    * `this.is.an.example` --> This would **not match** (note it is the 
identical class name). The regular expression would change it to 
`this\\..is\\..an\\..example`. Which means that there is an extra wildcard `.` 
that would need be matched.
    * `this.is.an*` --> This would **not match**. The regular expression would 
change it to `this\\..is\\..an*` which suffer from the problem in the first 
example.
    * `this.*` --> This would **match** (note that this example is given in the 
unit tests for #28829). The regular expression would would change it to 
`this\\..*`. It will match because it will look for the word *this* followed by 
*.* and then the wildcard character with the wildstar character (which will 
match zero or all of anything). To note that this is *THE* only way to make 
things match without altering the input string.
   
   ### What you think should happen instead
   
   In order to make a match for `this.is.an.example`, I need to put as input in 
`allowed_deserialization_classes` **this[.]is[.]an[.]example**:
    * The above input would be left alone by the regex expression substitution.
    * It also does what the author wants (which is to match the period `.`)
    * To note that one can also just escape the `.` by doing 
**this\.is\.an\.example** which would have the identical result.
   
   In fact, any correct regex would match (even **this[.]is[.]an***) as long as 
`re.sub(r"(\w)\.", r"\1\..", p)` does alter things.
   
   ### What I think should happen
   It is dubious to alter a regex in the first place. Airflow is aimed at 
technical people, and as such, proper documentation would suffice. There I 
suggest the following:
    1. Remove the `re.sub(r"(\w)\.", r"\1\..", p)`
    2. Correct the documentation and warn people to escape the wildcard 
character.
    3. If the intention is still to make it easier (without needing to escape 
the wildcard character, then the regex should be changed to `re.sub(r"(\w)\.", 
r"\1\.", p)` (note the removal of the last period compared to what is in the 
code base now)
   
   ### How to reproduce
   
   I reproduced by copying the logic and putting it in a test task. I did the 
following:
   
   ```python
   @dag(
       dag_id="test-dag-used-to-test",
       start_date=datetime(2021, 1, 1),  # start sometime in the past
       catchup=False,  # important, since we started in the past
   )
   def test_dag():
       @task
       def test_match():
           input = 'this.is.an.example'  # Note that is the exact class name
           p = re.sub(r"(\w)\.", r"\1\..", input)
           result = any(p.match(classname)  # Returns false
   
       test_match()
   
   
   test_dag()
   
   ```
   
   ### Operating System
   
   MacOS
   
   ### Versions of Apache Airflow Providers
   
   The change was introduced in #28829 which was from AirFlow `2.6.0`
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to