[GitHub] [airflow] dstandish commented on a change in pull request #19257: fix: dag_id and task_id non ascii char

GitBox Tue, 02 Nov 2021 13:28:21 -0700


dstandish commented on a change in pull request #19257:
URL: https://github.com/apache/airflow/pull/19257#discussion_r741438344




##########
File path: airflow/models/dag.py
##########
@@ -368,8 +369,11 @@ def __init__(
                 DeprecationWarning,
                 stacklevel=2,
             )
-
-        validate_key(dag_id)
+        
+        if not is_ascii(dag_id):
+            # slugify dag id
+            dag_id = slugify(dag_id, lowercase=False)

Review comment:
       > I can't immediately see any obvious problem here (of course when you 
integrate with it via API etc. You need to use different I'd but when you 
query/list etc. You will see the slugified dag id).
   > 
   > Why do you think it will be a problem ?
   
   So the issue I see with it is consistency.  As a data engineer, a lot of the 
job is managing chaos -- you want things to be organized and predictable and 
consistent.  And so the obvious scenario that comes to mind is, let's say 
there's a dag failure you've observed in the UI.  The dag_id in the UI will be 
`dag_Ni-Hao-_czesc`.  It's my job to fix that dag.  So I open the code repo in 
pycharm and do `cmd+shift+f` and search for it.  But I won't find it.  Because 
in _the code_ it is defined as `dag_你好_cześć`.
   
   So this would make a data engineer's job more challenging.
   
   If full unicode support is not practical in all databases, perhaps it could 
be a setting `allow_unicode_dag_ids` or something, and leave it up to the user 
to determine whether their database is happy enough with arbitrary unicode dag 
ids?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] dstandish commented on a change in pull request #19257: fix: dag_id and task_id non ascii char

Reply via email to