Hi Abdul,

Friendly bump this thread. Do we have an agreement on which route we
are going to take?

Thanks,

Ping


On Fri, Jan 20, 2023 at 3:50 PM Ping Zhang <[email protected]> wrote:

> Hi,
>
> I would vote for: `[Vote -1] To use display_name along with dag_id as DAG
> params`.
>
> `dag_id` is a fundamental core concept in the system, changing that might
> have some repercussions. There is also another advantage of having the
> display_name, which allows the UI to display a more meaningful and
> human-readable format of dags.
>
> Thanks,
>
> Ping
>
>
> On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <[email protected]> wrote:
>
>> One more thing here. I think historically we also have other places we
>> don't fully realise where we rely on dag_id being ASCII.
>>
>> Example here https://github.com/apache/airflow/issues/18010 where
>> non-ASCII dag_id causes airflow scheduler to crash (and in
>> irrecoverable way at that) when statsd is enabled. Another example are the
>> log file names (there are still filesystems that do not support unicode or
>> they can have different encoding. But also there are a number of edge cases
>> (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not
>> insurmountable of course - for example we could also slugify the statsd
>> label. But then those are opening up pandora's box of problems that we
>> won't be able to fully unit test and prevent - because we can't reasonably
>> test all kinds of combinations of filesystems, encoding, telemetry systems
>> etc. etc. And we will have plenty of errors reported after we release it.
>>
>> I personally think that if we continue insisting on "let's make the
>> dag_id unicode" we might keep on bumping on those issues we do not
>> understand or realise and that might mean - more issues, more frustrated
>> users, more support and diagnostics for the community. While I see how
>> "idealistic" approach on getting the id being fully "internationalizable"
>> is (hey we are in 2023, we should be able to handle non-ASCII as id),
>> pragmatically speaking, getting extra "display" while keeping the original
>> dag_id  as ASCII unique identifier might be much pragmatic and attainable.
>>
>> I remember similar discussions happening for > 1 year now, any attempt to
>> add an "extra" display was effectively blocked by a strong opposition "hey
>> we should make the dag_id unicode" ... and nothing happened. Thus
>> effectively our users cannot use non-ASCII characters in dag_id really.
>>
>> We might continue having the idea that the dag_id "can be non-ASCII", but
>> I think it is quite difficult to pull off and no-one has the courage to try
>> it and take responsibility to tackle all the issues (some of which we do
>> not even know).
>>
>> I would rather attempt it in a pragmatic way - even if it means we need
>> to add an extra "display" entity. Seems like more predictable in terms of
>> problems our users will experience.
>>
>> J,
>>
>>
>> On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <[email protected]> wrote:
>>
>>> > Possibly contentious idea: We allow unicode dag_ids for Postgres,
>>> MSQQL (and sqllite) but for Mysql we enforce it as ASCII only.
>>>
>>> If we can make it easy and make sure it is handled well and we do not
>>> have extra maintenance - yes, I think it is viable solution. One more
>>> reason for those who use MySQL to migrate :D
>>>
>>>
>>>
>>> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <[email protected]>
>>> wrote:
>>>
>>>> The description of option 1 is wrong/misleading. It sounds like you are
>>>> proposing DAG(name="tést") but what actually happens its that gets
>>>> magically changed to `test` behind the users back. -1 veto to that.
>>>>
>>>> So it's Option2 or some combo of the "break MySQL in some way" options.
>>>>
>>>> -ash
>>>>
>>>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <[email protected]> wrote:
>>>>
>>>>
>>>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>>>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>>>
>>>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <[email protected]> wrote:
>>>>
>>>> As I mentioned multiple times in similar discussions We have a huge
>>>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
>>>> have to shorten the Id significantly in the database to workaround MySQL
>>>> limits for index size.
>>>>
>>>> We can have a wishful thinking that we can change dag_id to unicode but
>>>> until someone solves the problem - this is just this - wishful thinking.
>>>>
>>>> If someone has a proposal how to do it without breaking compatibility
>>>> or enormously complicating mysql case (or if we drop mysql proposal) - I
>>>> would also be for what Daniel said. But so far I have not seen any.
>>>>
>>>> So in the absence of a viable way to add unicode to dag_id (which
>>>> currently IMHO is not an option) my vote goes to 2.
>>>>
>>>> We can also drop MySQL support :D
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <[email protected]>
>>>> wrote:
>>>>
>>>> +1 to what Daniel said
>>>>
>>>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>>>> <[email protected]> wrote:
>>>>
>>>> 1 appears to have potential fix:
>>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>>>> 2. seems to fail due to our own ascii enforcement... what if we remove
>>>> that?
>>>> 3. does not appear to be unicode-related or dag_id-related but a
>>>> feature request for user-friendly mapped task aliases...
>>>>
>>>> not saying we should not add a "name" of some kind... but ... does not
>>>> yet seem clear we can't just enable unicode...  i know others have given
>>>> this much more thought than I and maybe they can chime in with other
>>>> concerns we may have encountered as this idea has bounced around
>>>>
>>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <
>>>> [email protected]> wrote:
>>>>
>>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>>> couples of functionalities. See issues:
>>>>
>>>>    - Fail to download task log if there are Chinese characters in
>>>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>>>    - Airflow scheduler with statsd enabled crashes when dag_id
>>>>    contains unexpected characters #18010
>>>>    <https://github.com/apache/airflow/issues/18010>
>>>>    - Names for expanded tasks #23020
>>>>    <https://github.com/apache/airflow/issues/23020>
>>>>
>>>> *Abdul Hadi Shakir*
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>>> <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Is it not possible to just have unicode dag_id with no distinct
>>>> "name"?  If you explored this route and encountered problems which caused
>>>> you to abandon, can you share what were the problems?
>>>>
>>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>>
>>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>>> [email protected]> wrote:
>>>>
>>>> Hi team,
>>>>
>>>> While discussing the approach for
>>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>>> national characters in DAG display name) - two approaches came out. Need
>>>> votes to finalise on one of the two:
>>>>
>>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>>    generating a unique *dag_id* from it using *slugify*. This makes
>>>>    the interface simpler; but it makes *dag_id* unknown from the
>>>>    users. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/28183
>>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>>    params. While this is a simpler solution on the backend - it needs lots 
>>>> of
>>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/27145
>>>>    3.
>>>>
>>>> Cheers,
>>>> *Abdul Hadi Shakir*
>>>>
>>>>

Reply via email to