Hi Abdul, Friendly bump this thread. Do we have an agreement on which route we are going to take?
Thanks, Ping On Fri, Jan 20, 2023 at 3:50 PM Ping Zhang <[email protected]> wrote: > Hi, > > I would vote for: `[Vote -1] To use display_name along with dag_id as DAG > params`. > > `dag_id` is a fundamental core concept in the system, changing that might > have some repercussions. There is also another advantage of having the > display_name, which allows the UI to display a more meaningful and > human-readable format of dags. > > Thanks, > > Ping > > > On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <[email protected]> wrote: > >> One more thing here. I think historically we also have other places we >> don't fully realise where we rely on dag_id being ASCII. >> >> Example here https://github.com/apache/airflow/issues/18010 where >> non-ASCII dag_id causes airflow scheduler to crash (and in >> irrecoverable way at that) when statsd is enabled. Another example are the >> log file names (there are still filesystems that do not support unicode or >> they can have different encoding. But also there are a number of edge cases >> (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not >> insurmountable of course - for example we could also slugify the statsd >> label. But then those are opening up pandora's box of problems that we >> won't be able to fully unit test and prevent - because we can't reasonably >> test all kinds of combinations of filesystems, encoding, telemetry systems >> etc. etc. And we will have plenty of errors reported after we release it. >> >> I personally think that if we continue insisting on "let's make the >> dag_id unicode" we might keep on bumping on those issues we do not >> understand or realise and that might mean - more issues, more frustrated >> users, more support and diagnostics for the community. While I see how >> "idealistic" approach on getting the id being fully "internationalizable" >> is (hey we are in 2023, we should be able to handle non-ASCII as id), >> pragmatically speaking, getting extra "display" while keeping the original >> dag_id as ASCII unique identifier might be much pragmatic and attainable. >> >> I remember similar discussions happening for > 1 year now, any attempt to >> add an "extra" display was effectively blocked by a strong opposition "hey >> we should make the dag_id unicode" ... and nothing happened. Thus >> effectively our users cannot use non-ASCII characters in dag_id really. >> >> We might continue having the idea that the dag_id "can be non-ASCII", but >> I think it is quite difficult to pull off and no-one has the courage to try >> it and take responsibility to tackle all the issues (some of which we do >> not even know). >> >> I would rather attempt it in a pragmatic way - even if it means we need >> to add an extra "display" entity. Seems like more predictable in terms of >> problems our users will experience. >> >> J, >> >> >> On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <[email protected]> wrote: >> >>> > Possibly contentious idea: We allow unicode dag_ids for Postgres, >>> MSQQL (and sqllite) but for Mysql we enforce it as ASCII only. >>> >>> If we can make it easy and make sure it is handled well and we do not >>> have extra maintenance - yes, I think it is viable solution. One more >>> reason for those who use MySQL to migrate :D >>> >>> >>> >>> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <[email protected]> >>> wrote: >>> >>>> The description of option 1 is wrong/misleading. It sounds like you are >>>> proposing DAG(name="tést") but what actually happens its that gets >>>> magically changed to `test` behind the users back. -1 veto to that. >>>> >>>> So it's Option2 or some combo of the "break MySQL in some way" options. >>>> >>>> -ash >>>> >>>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <[email protected]> wrote: >>>> >>>> >>>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL >>>> (and sqllite) but for Mysql we enforce it as ASCII only. >>>> >>>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <[email protected]> wrote: >>>> >>>> As I mentioned multiple times in similar discussions We have a huge >>>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would >>>> have to shorten the Id significantly in the database to workaround MySQL >>>> limits for index size. >>>> >>>> We can have a wishful thinking that we can change dag_id to unicode but >>>> until someone solves the problem - this is just this - wishful thinking. >>>> >>>> If someone has a proposal how to do it without breaking compatibility >>>> or enormously complicating mysql case (or if we drop mysql proposal) - I >>>> would also be for what Daniel said. But so far I have not seen any. >>>> >>>> So in the absence of a viable way to add unicode to dag_id (which >>>> currently IMHO is not an option) my vote goes to 2. >>>> >>>> We can also drop MySQL support :D >>>> >>>> J. >>>> >>>> >>>> >>>> >>>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <[email protected]> >>>> wrote: >>>> >>>> +1 to what Daniel said >>>> >>>> On 12 January 2023 08:32:29 GMT, Daniel Standish >>>> <[email protected]> wrote: >>>> >>>> 1 appears to have potential fix: >>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862 >>>> 2. seems to fail due to our own ascii enforcement... what if we remove >>>> that? >>>> 3. does not appear to be unicode-related or dag_id-related but a >>>> feature request for user-friendly mapped task aliases... >>>> >>>> not saying we should not add a "name" of some kind... but ... does not >>>> yet seem clear we can't just enable unicode... i know others have given >>>> this much more thought than I and maybe they can chime in with other >>>> concerns we may have encountered as this idea has bounced around >>>> >>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir < >>>> [email protected]> wrote: >>>> >>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks >>>> couples of functionalities. See issues: >>>> >>>> - Fail to download task log if there are Chinese characters in >>>> dag_id #21127 <https://github.com/apache/airflow/issues/21127> >>>> - Airflow scheduler with statsd enabled crashes when dag_id >>>> contains unexpected characters #18010 >>>> <https://github.com/apache/airflow/issues/18010> >>>> - Names for expanded tasks #23020 >>>> <https://github.com/apache/airflow/issues/23020> >>>> >>>> *Abdul Hadi Shakir* >>>> >>>> >>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish >>>> <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> Is it not possible to just have unicode dag_id with no distinct >>>> "name"? If you explored this route and encountered problems which caused >>>> you to abandon, can you share what were the problems? >>>> >>>> I think having just one ID for a dag is a nice thing, if we can keep it. >>>> >>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir < >>>> [email protected]> wrote: >>>> >>>> Hi team, >>>> >>>> While discussing the approach for >>>> https://github.com/apache/airflow/issues/22073 (adding support for >>>> national characters in DAG display name) - two approaches came out. Need >>>> votes to finalise on one of the two: >>>> >>>> 1. [Vote *+1*] Using *name* as the only parameter; and then >>>> generating a unique *dag_id* from it using *slugify*. This makes >>>> the interface simpler; but it makes *dag_id* unknown from the >>>> users. Ongoing PR for this: >>>> https://github.com/apache/airflow/pull/28183 >>>> 2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG >>>> params. While this is a simpler solution on the backend - it needs lots >>>> of >>>> work on the frontend for a consistent experience. Ongoing PR for this: >>>> https://github.com/apache/airflow/pull/27145 >>>> 3. >>>> >>>> Cheers, >>>> *Abdul Hadi Shakir* >>>> >>>>
