Hi, I would vote for: `[Vote -1] To use display_name along with dag_id as DAG params`.
`dag_id` is a fundamental core concept in the system, changing that might have some repercussions. There is also another advantage of having the display_name, which allows the UI to display a more meaningful and human-readable format of dags. Thanks, Ping On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <[email protected]> wrote: > One more thing here. I think historically we also have other places we > don't fully realise where we rely on dag_id being ASCII. > > Example here https://github.com/apache/airflow/issues/18010 where > non-ASCII dag_id causes airflow scheduler to crash (and in > irrecoverable way at that) when statsd is enabled. Another example are the > log file names (there are still filesystems that do not support unicode or > they can have different encoding. But also there are a number of edge cases > (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not > insurmountable of course - for example we could also slugify the statsd > label. But then those are opening up pandora's box of problems that we > won't be able to fully unit test and prevent - because we can't reasonably > test all kinds of combinations of filesystems, encoding, telemetry systems > etc. etc. And we will have plenty of errors reported after we release it. > > I personally think that if we continue insisting on "let's make the dag_id > unicode" we might keep on bumping on those issues we do not understand or > realise and that might mean - more issues, more frustrated users, more > support and diagnostics for the community. While I see how "idealistic" > approach on getting the id being fully "internationalizable" is (hey we are > in 2023, we should be able to handle non-ASCII as id), pragmatically > speaking, getting extra "display" while keeping the original dag_id as > ASCII unique identifier might be much pragmatic and attainable. > > I remember similar discussions happening for > 1 year now, any attempt to > add an "extra" display was effectively blocked by a strong opposition "hey > we should make the dag_id unicode" ... and nothing happened. Thus > effectively our users cannot use non-ASCII characters in dag_id really. > > We might continue having the idea that the dag_id "can be non-ASCII", but > I think it is quite difficult to pull off and no-one has the courage to try > it and take responsibility to tackle all the issues (some of which we do > not even know). > > I would rather attempt it in a pragmatic way - even if it means we need to > add an extra "display" entity. Seems like more predictable in terms of > problems our users will experience. > > J, > > > On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <[email protected]> wrote: > >> > Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL >> (and sqllite) but for Mysql we enforce it as ASCII only. >> >> If we can make it easy and make sure it is handled well and we do not >> have extra maintenance - yes, I think it is viable solution. One more >> reason for those who use MySQL to migrate :D >> >> >> >> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <[email protected]> >> wrote: >> >>> The description of option 1 is wrong/misleading. It sounds like you are >>> proposing DAG(name="tést") but what actually happens its that gets >>> magically changed to `test` behind the users back. -1 veto to that. >>> >>> So it's Option2 or some combo of the "break MySQL in some way" options. >>> >>> -ash >>> >>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <[email protected]> wrote: >>> >>> >>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL >>> (and sqllite) but for Mysql we enforce it as ASCII only. >>> >>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <[email protected]> wrote: >>> >>> As I mentioned multiple times in similar discussions We have a huge >>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would >>> have to shorten the Id significantly in the database to workaround MySQL >>> limits for index size. >>> >>> We can have a wishful thinking that we can change dag_id to unicode but >>> until someone solves the problem - this is just this - wishful thinking. >>> >>> If someone has a proposal how to do it without breaking compatibility or >>> enormously complicating mysql case (or if we drop mysql proposal) - I would >>> also be for what Daniel said. But so far I have not seen any. >>> >>> So in the absence of a viable way to add unicode to dag_id (which >>> currently IMHO is not an option) my vote goes to 2. >>> >>> We can also drop MySQL support :D >>> >>> J. >>> >>> >>> >>> >>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <[email protected]> >>> wrote: >>> >>> +1 to what Daniel said >>> >>> On 12 January 2023 08:32:29 GMT, Daniel Standish >>> <[email protected]> wrote: >>> >>> 1 appears to have potential fix: >>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862 >>> 2. seems to fail due to our own ascii enforcement... what if we remove >>> that? >>> 3. does not appear to be unicode-related or dag_id-related but a feature >>> request for user-friendly mapped task aliases... >>> >>> not saying we should not add a "name" of some kind... but ... does not >>> yet seem clear we can't just enable unicode... i know others have given >>> this much more thought than I and maybe they can chime in with other >>> concerns we may have encountered as this idea has bounced around >>> >>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir < >>> [email protected]> wrote: >>> >>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks >>> couples of functionalities. See issues: >>> >>> - Fail to download task log if there are Chinese characters in >>> dag_id #21127 <https://github.com/apache/airflow/issues/21127> >>> - Airflow scheduler with statsd enabled crashes when dag_id contains >>> unexpected characters #18010 >>> <https://github.com/apache/airflow/issues/18010> >>> - Names for expanded tasks #23020 >>> <https://github.com/apache/airflow/issues/23020> >>> >>> *Abdul Hadi Shakir* >>> >>> >>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish >>> <[email protected]> wrote: >>> >>> Hi, >>> >>> Is it not possible to just have unicode dag_id with no distinct "name"? >>> If you explored this route and encountered problems which caused you to >>> abandon, can you share what were the problems? >>> >>> I think having just one ID for a dag is a nice thing, if we can keep it. >>> >>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir < >>> [email protected]> wrote: >>> >>> Hi team, >>> >>> While discussing the approach for >>> https://github.com/apache/airflow/issues/22073 (adding support for >>> national characters in DAG display name) - two approaches came out. Need >>> votes to finalise on one of the two: >>> >>> 1. [Vote *+1*] Using *name* as the only parameter; and then >>> generating a unique *dag_id* from it using *slugify*. This makes the >>> interface simpler; but it makes *dag_id* unknown from the users. >>> Ongoing PR for this: https://github.com/apache/airflow/pull/28183 >>> 2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG >>> params. While this is a simpler solution on the backend - it needs lots >>> of >>> work on the frontend for a consistent experience. Ongoing PR for this: >>> https://github.com/apache/airflow/pull/27145 >>> 3. >>> >>> Cheers, >>> *Abdul Hadi Shakir* >>> >>>
