Hi Kaxil,

Thanks for the comment. The serialized_dag isn't used to run the task in
the `airflow run --raw` process. It is used in the `airflow run --local` to
perform `check_and_change_state_before_execution`
https://github.com/apache/airflow/blob/main/airflow/jobs/local_task_job.py#L88-L99


Thanks,

Ping


On Mon, Dec 20, 2021 at 4:51 AM Kaxil Naik <[email protected]> wrote:

> Yup, forking only applies when os.fork is available and run_as_user isn't
> specified. We had only added enough details in Serialized DAGs that are
> needed for the Webserver and to make any scheduling decisions in the
> Scheduler.
>
> So it does not contain all the information (all the args, kwargs including
> callables) required to run the task.
>
> Looking forward for the AIP.
>
> Regards,
> Kaxil
>
> On Fri, Dec 17, 2021 at 11:04 PM Ping Zhang <[email protected]> wrote:
>
>> Hi Ash,
>>
>> Thanks for the inputs about the fork approach. I have checked the code.
>> The fork only applies when there is no run_as_user. I think the run_as_user
>> is an important feature.
>>
>> I will create an AIP with more details.
>>
>> Best wishes
>>
>> Ping Zhang
>>
>>
>> On Fri, Dec 17, 2021 at 9:59 AM Jarek Potiuk <[email protected]> wrote:
>>
>>> Yeah. I would also love to see some details in the meeting I proposed
>>> :). I am particularly interested about the current limitation of the
>>> solution in "general" case.
>>>
>>> J,
>>>
>>> On Fri, Dec 17, 2021 at 11:16 AM Ash Berlin-Taylor <[email protected]>
>>> wrote:
>>> >
>>> > On Thu, Dec 16 2021 at 16:19:45 -0800, Ping Zhang <[email protected]>
>>> wrote:
>>> >
>>> > To run airflow tasks, airflow needs to parse dag file twice, once in
>>> airflow run local process, once in airflow run raw
>>> >
>>> >
>>> > This isn't true in most cases anymore thanks to a change from spawning
>>> a new process (os.exec(["airflow",...]) to fork instead.
>>> >
>>> > The serialized_dag table doesn't (currently) contain enough
>>> information to actually execute every dag, especially in the case of
>>> PythonOperator, so the actual dag file on disk needs to be loaded to get
>>> code to run, so perhaps it would be possible to do this for some operators,
>>> but not all.
>>> >
>>> > Still might be worth looking at it and I'm looking forward to the
>>> proposal!
>>> >
>>> > -ash
>>>
>>

Reply via email to