Hi Ash,

 

Thanks for the response.

 

About the suggestion 2 : > 2. Yes, 

 

we should avoid doing this. Do we still do this anywhere?> 

Actually, I haven’t fully understood the `airflow` source code, maybe we can 
have a check to make sure we don’t use it.

 

About the suggestion 3: > 3. Localize  the `run_id`

I think about the project, we should care about all the use cases.

If the project is installed across more than one `TZ`, the `run_id` should be 
used UTC time.

But if there is only one `TZ` the user use, I think we should give the user an 
option to localize the `run_id`.

So in the PR, I add a config to allow users to choose.

 

On 2021/08/18 21:43:44, Ash Berlin-Taylor <[email protected]> wrote: 

> Hi Lionel,

> 

> Great questions, most of them are for historic reasons.

> 

> Getting run_type form run_id: should only be used for back-compat -- 

> the run_type column didn't used to exist (it was only added about 6-9 

> months ago from my rough memory) but going forward the "prefix" on 

> run_id has no meaning anymore, run_type is all that matters.

> 

> run_id vs execution_date: I have plans (and I'm slowly working towards 

> this) to make execution_date /not/ unique on the dag_run. For example 

> lets say you have two (or n) models you want to try out and see which 

> performs better. To really compare them you need them to operate on the 

> same data, so ideally that means the same execution_date.

> 

> run_id is just meant to be that -- an identifier. It's exact value 

> holds _no_ meaning to Airflow anymore, and we are free to have it take 

> whatever value makes most sense to a user.

> 

> As to your suggestions:

> 1. Yes, more clear docs would always be good

> 2. Yes, we should avoid doing this. Do we still do this anywhere?

> 3. As per your PR, I think making the behaviour configurable makes 

> sense -- as some airflow install operate "across" more than one TZ, so 

> having them all be UTC might be a good option there.

> 

> Thanks,

> Ash

> 

> 

> On Wed, Aug 18 2021 at 10:33:41 +0800, Lionel Zhao 

> <[email protected]> wrote:

> > Hi guys,

> > 

> >                 When I try to use the airflow, I found the dag 

> > `run_id` shown on the page is the UTC time and my time zone is +8:00, 

> > it makes me quite hard to know which runs exactly are?

> > 

> > For example, I trigger a dag run at ‘2020-08-18 10:10:00’ but the 

> > dag `run_id` is `2020-08-18 02:10:00`.

> > 

> > So I create a PR here: https://github.com/apache/airflow/pull/17502 

> > to localize the dag `run_id` and the PR is WIP now.

> > 

> > But I think we can have a discussion about the `run_id`. Actually, it 

> > makes me quite confused about the `run_id` definition when I check 

> > the sources.

> > 

> > There are 2 points:

> > 

> > Actually, most of the time we use the `execution_date` to query the 

> > dag_runs, and there is also a UNIQUE_KEY( dag_id+ execution_date), 

> > why do we still need another key to query.  And in fact, the 

> > `execution_date` can be the `run_id` already and we don’t need 

> > another `run_id`. If we want to use the `run_id` to let the user know 

> > when the task extract ly run, but it is UTC time, and it is very hard 

> > for users to useI saw use in some places, we get the run_type from 

> > the `run_id`, but we didn’t set a clear rule of the `run_id`. It 

> > will be a risk in the future because it is a hidden rule of the dag 

> > `run_id`.

> > For my suggestions:

> > 

> > 1.                  We should clear the definition of the `run_id` 

> > and make a clear rule of it.

> > 

> > 2.                  Avoid getting the `run_type` from the `run_id` 

> > and only use the `run_type` in the dag_run

> > 

> > 3.                  Change the `run_id` to local time to make the 

> > user know the exact run time easily.

> > 

> > 

> > 

> > 

> > 

> > Just awider discussions, let me know what do you think.

> > 

> > Thanks a lot

> > 

> > 

> > 

> > 

> > 

> > From,

> > 

> > Lionel Zhao

> > 

> > 

> > 

> 

> 

Reply via email to