Hi,

I would vote for: `[Vote -1] To use display_name along with dag_id as DAG
params`.

`dag_id` is a fundamental core concept in the system, changing that might
have some repercussions. There is also another advantage of having the
display_name, which allows the UI to display a more meaningful and
human-readable format of dags.

Thanks,

Ping


On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <[email protected]> wrote:

> One more thing here. I think historically we also have other places we
> don't fully realise where we rely on dag_id being ASCII.
>
> Example here https://github.com/apache/airflow/issues/18010 where
> non-ASCII dag_id causes airflow scheduler to crash (and in
> irrecoverable way at that) when statsd is enabled. Another example are the
> log file names (there are still filesystems that do not support unicode or
> they can have different encoding. But also there are a number of edge cases
> (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not
> insurmountable of course - for example we could also slugify the statsd
> label. But then those are opening up pandora's box of problems that we
> won't be able to fully unit test and prevent - because we can't reasonably
> test all kinds of combinations of filesystems, encoding, telemetry systems
> etc. etc. And we will have plenty of errors reported after we release it.
>
> I personally think that if we continue insisting on "let's make the dag_id
> unicode" we might keep on bumping on those issues we do not understand or
> realise and that might mean - more issues, more frustrated users, more
> support and diagnostics for the community. While I see how "idealistic"
> approach on getting the id being fully "internationalizable" is (hey we are
> in 2023, we should be able to handle non-ASCII as id), pragmatically
> speaking, getting extra "display" while keeping the original dag_id  as
> ASCII unique identifier might be much pragmatic and attainable.
>
> I remember similar discussions happening for > 1 year now, any attempt to
> add an "extra" display was effectively blocked by a strong opposition "hey
> we should make the dag_id unicode" ... and nothing happened. Thus
> effectively our users cannot use non-ASCII characters in dag_id really.
>
> We might continue having the idea that the dag_id "can be non-ASCII", but
> I think it is quite difficult to pull off and no-one has the courage to try
> it and take responsibility to tackle all the issues (some of which we do
> not even know).
>
> I would rather attempt it in a pragmatic way - even if it means we need to
> add an extra "display" entity. Seems like more predictable in terms of
> problems our users will experience.
>
> J,
>
>
> On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <[email protected]> wrote:
>
>> > Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>
>> If we can make it easy and make sure it is handled well and we do not
>> have extra maintenance - yes, I think it is viable solution. One more
>> reason for those who use MySQL to migrate :D
>>
>>
>>
>> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <[email protected]>
>> wrote:
>>
>>> The description of option 1 is wrong/misleading. It sounds like you are
>>> proposing DAG(name="tést") but what actually happens its that gets
>>> magically changed to `test` behind the users back. -1 veto to that.
>>>
>>> So it's Option2 or some combo of the "break MySQL in some way" options.
>>>
>>> -ash
>>>
>>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <[email protected]> wrote:
>>>
>>>
>>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>>
>>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <[email protected]> wrote:
>>>
>>> As I mentioned multiple times in similar discussions We have a huge
>>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
>>> have to shorten the Id significantly in the database to workaround MySQL
>>> limits for index size.
>>>
>>> We can have a wishful thinking that we can change dag_id to unicode but
>>> until someone solves the problem - this is just this - wishful thinking.
>>>
>>> If someone has a proposal how to do it without breaking compatibility or
>>> enormously complicating mysql case (or if we drop mysql proposal) - I would
>>> also be for what Daniel said. But so far I have not seen any.
>>>
>>> So in the absence of a viable way to add unicode to dag_id (which
>>> currently IMHO is not an option) my vote goes to 2.
>>>
>>> We can also drop MySQL support :D
>>>
>>> J.
>>>
>>>
>>>
>>>
>>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <[email protected]>
>>> wrote:
>>>
>>> +1 to what Daniel said
>>>
>>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>>> <[email protected]> wrote:
>>>
>>> 1 appears to have potential fix:
>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>>> 2. seems to fail due to our own ascii enforcement... what if we remove
>>> that?
>>> 3. does not appear to be unicode-related or dag_id-related but a feature
>>> request for user-friendly mapped task aliases...
>>>
>>> not saying we should not add a "name" of some kind... but ... does not
>>> yet seem clear we can't just enable unicode...  i know others have given
>>> this much more thought than I and maybe they can chime in with other
>>> concerns we may have encountered as this idea has bounced around
>>>
>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <
>>> [email protected]> wrote:
>>>
>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>> couples of functionalities. See issues:
>>>
>>>    - Fail to download task log if there are Chinese characters in
>>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>>>    unexpected characters #18010
>>>    <https://github.com/apache/airflow/issues/18010>
>>>    - Names for expanded tasks #23020
>>>    <https://github.com/apache/airflow/issues/23020>
>>>
>>> *Abdul Hadi Shakir*
>>>
>>>
>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>> <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Is it not possible to just have unicode dag_id with no distinct "name"?
>>> If you explored this route and encountered problems which caused you to
>>> abandon, can you share what were the problems?
>>>
>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>
>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>> [email protected]> wrote:
>>>
>>> Hi team,
>>>
>>> While discussing the approach for
>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>> national characters in DAG display name) - two approaches came out. Need
>>> votes to finalise on one of the two:
>>>
>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>    generating a unique *dag_id* from it using *slugify*. This makes the
>>>    interface simpler; but it makes *dag_id* unknown from the users.
>>>    Ongoing PR for this: https://github.com/apache/airflow/pull/28183
>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>    params. While this is a simpler solution on the backend - it needs lots 
>>> of
>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>    https://github.com/apache/airflow/pull/27145
>>>    3.
>>>
>>> Cheers,
>>> *Abdul Hadi Shakir*
>>>
>>>

Reply via email to