potiuk commented on a change in pull request #19257:
URL: https://github.com/apache/airflow/pull/19257#discussion_r741690599



##########
File path: airflow/models/dag.py
##########
@@ -368,8 +369,11 @@ def __init__(
                 DeprecationWarning,
                 stacklevel=2,
             )
-
-        validate_key(dag_id)
+        
+        if not is_ascii(dag_id):
+            # slugify dag id
+            dag_id = slugify(dag_id, lowercase=False)

Review comment:
       Yeah, I hear you @dstandish. And I am indeed more convinced we should 
probably try to do as you say (the problem with unicode support for MySQL is 
mostly about very obscure characters - one that I know for sure is that one 
that someone tried to use in the past and caused some mysql issues: 🦅. Quite 
unlikely to be useful for engineering consistency.
   
   But this approach is a bit risly. We would  have to modify some basic 
premise we have with the IDs.
   I wonder what's the reason we have this (@Melodie97 - this is the reason why 
'*' is not allowed). I have a feeling that we need to understand why we have 
those limitations on dag id in the first place. 
   
   ```
   KEY_REGEX = re.compile(r'^[\w.-]+$')
   GROUP_KEY_REGEX = re.compile(r'^[\w-]+$')
   
   
   def validate_key(k: str, max_length: int = 250):
       """Validates value used as a key."""
       if not isinstance(k, str):
           raise TypeError(f"The key has to be a string and is {type(k)}:{k}")
       if len(k) > max_length:
           raise AirflowException(f"The key has to be less than {max_length} 
characters")
       if not KEY_REGEX.match(k):
           raise AirflowException(
               f"The key ({k}) has to be made of alphanumeric characters, 
dashes, "
               f"dots and underscores exclusively"
           )
   
   
   def validate_group_key(k: str, max_length: int = 200):
       """Validates value used as a group key."""
       if not isinstance(k, str):
           raise TypeError(f"The key has to be a string and is {type(k)}:{k}")
       if len(k) > max_length:
           raise AirflowException(f"The key has to be less than {max_length} 
characters")
       if not GROUP_KEY_REGEX.match(k):
           raise AirflowException(
               f"The key ({k!r}) has to be made of alphanumeric characters, 
dashes and underscores exclusively"
           )
   ```
   
   BTW. @Melodie97  - this is one of the examples where seemingly simple change 
sparked discussion and it's quite unlikely we come to a conclusion quickly :). 
So rather than fix it for Outreachy it's good if you rather contribute to the 
discussion (if you can provide your opinion on that and got enough 
understanding (or maybe take a look at the code to try to figure out why we 
have the limitation at all. That STILL is a contribution (even if it is 
unlikely to produce a merged change soon). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to