dstandish commented on a change in pull request #19257:
URL: https://github.com/apache/airflow/pull/19257#discussion_r741438344
##########
File path: airflow/models/dag.py
##########
@@ -368,8 +369,11 @@ def __init__(
DeprecationWarning,
stacklevel=2,
)
-
- validate_key(dag_id)
+
+ if not is_ascii(dag_id):
+ # slugify dag id
+ dag_id = slugify(dag_id, lowercase=False)
Review comment:
> I can't immediately see any obvious problem here (of course when you
integrate with it via API etc. You need to use different I'd but when you
query/list etc. You will see the slugified dag id).
>
> Why do you think it will be a problem ?
So the issue I see with it is consistency. As a data engineer, a lot of the
job is managing chaos -- you want things to be organized and predictable and
consistent. And so the obvious scenario that comes to mind is, let's say
there's a dag failure you've observed in the UI. The dag_id in the UI will be
`dag_Ni-Hao-_czesc`. It's my job to fix that dag. So I open the code repo in
pycharm and do `cmd+shift+f` and search for it. But I won't find it. Because
in _the code_ it is defined as `dag_你好_cześć`.
So this would make a data engineer's job more challenging.
If full unicode support is not practical in all databases, perhaps it could
be a setting `allow_unicode_dag_ids` or something, and leave it up to the user
to determine whether their database is happy enough with arbitrary unicode dag
ids?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]