Yup, forking only applies when os.fork is available and run_as_user isn't
specified. We had only added enough details in Serialized DAGs that are
needed for the Webserver and to make any scheduling decisions in the
Scheduler.

So it does not contain all the information (all the args, kwargs including
callables) required to run the task.

Looking forward for the AIP.

Regards,
Kaxil

On Fri, Dec 17, 2021 at 11:04 PM Ping Zhang <[email protected]> wrote:

> Hi Ash,
>
> Thanks for the inputs about the fork approach. I have checked the code.
> The fork only applies when there is no run_as_user. I think the run_as_user
> is an important feature.
>
> I will create an AIP with more details.
>
> Best wishes
>
> Ping Zhang
>
>
> On Fri, Dec 17, 2021 at 9:59 AM Jarek Potiuk <[email protected]> wrote:
>
>> Yeah. I would also love to see some details in the meeting I proposed
>> :). I am particularly interested about the current limitation of the
>> solution in "general" case.
>>
>> J,
>>
>> On Fri, Dec 17, 2021 at 11:16 AM Ash Berlin-Taylor <[email protected]>
>> wrote:
>> >
>> > On Thu, Dec 16 2021 at 16:19:45 -0800, Ping Zhang <[email protected]>
>> wrote:
>> >
>> > To run airflow tasks, airflow needs to parse dag file twice, once in
>> airflow run local process, once in airflow run raw
>> >
>> >
>> > This isn't true in most cases anymore thanks to a change from spawning
>> a new process (os.exec(["airflow",...]) to fork instead.
>> >
>> > The serialized_dag table doesn't (currently) contain enough information
>> to actually execute every dag, especially in the case of PythonOperator, so
>> the actual dag file on disk needs to be loaded to get code to run, so
>> perhaps it would be possible to do this for some operators, but not all.
>> >
>> > Still might be worth looking at it and I'm looking forward to the
>> proposal!
>> >
>> > -ash
>>
>

Reply via email to