[
https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164511#comment-16164511
]
ASF subversion and git services commented on AIRFLOW-160:
---------------------------------------------------------
Commit 028b3b88ff4f191c78bf1d9c41bf43a792f640ff in incubator-airflow's branch
refs/heads/master from [~ashb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=028b3b8 ]
[AIRFLOW-1606][Airflow-1606][AIRFLOW-1605][AIRFLOW-160] DAG.sync_to_db is now a
normal method
Previously it was a static method that took as
it's first argument a
DAG, which really meant it wasn't truly a static
method.
To avoid reversing the parameter order I have
given sensible defaults
from the one and only use in the rest of the code
base.
Also remove documented "sync_to_db" parameter on
DagBag that no longer
exists -- this doc string refers to a parameter
that was removed in
[AIRFLOW-160].
Closes #2605 from ashb/AIRFLOW-1606-db-sync_to_db-
not-static
> Parse DAG files through child processes
> ---------------------------------------
>
> Key: AIRFLOW-160
> URL: https://issues.apache.org/jira/browse/AIRFLOW-160
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Reporter: Paul Yang
> Assignee: Paul Yang
>
> Currently, the Airflow scheduler parses all user DAG files in the same
> process as the scheduler itself. We've seen issues in production where bad
> DAG files cause scheduler to fail. A simple example is if the user script
> calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an
> unusual case where modules loaded by the user DAG affect operation of the
> scheduler. For better uptime, the scheduler should be resistant to these
> problematic user DAGs.
> The proposed solution is to parse and schedule user DAGs through child
> processes. This way, the main scheduler process is more isolated from bad
> DAGs. There's a side benefit as well - since parsing is distributed among
> multiple processes, it's possible to parse the DAG files more frequently,
> reducing the latency between when a DAG is modified and when the changes are
> picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks
> are sent to the executor. This means that the frequency of task scheduling is
> limited by the slowest DAG to schedule. The changes needed for scheduling
> DAGs through child processes will also make it easy to decouple this process
> and allow tasks to be scheduled and sent to the executor in a more
> independent fashion. This way, overall scheduling won't be held back by a
> slow DAG.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)