[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651649#comment-15651649 ]
Kevin Gao commented on AIRFLOW-401: ----------------------------------- We seem to be running into a similar issue on both versions 1.7.0 and 1.7.1.3. I'm wondering if this behavior is expected, and we're just incorrectly using the airflow scheduler + LocalExecutor. From what I can tell, the scheduler may have run through its last iteration for the life of the process, but is waiting for its child processes to complete (the local executors executing the long-running tasks). As a result, no further tasks are able to be scheduled until the long running task is completed. My current thoughts are we should probably switch to the CeleryExecutor in order to make the scheduler independent of the executors. Relevant configs: {code:ini} [core] executor = LocalExecutor parallelism = 32 dag_concurrency = 16 dags_are_paused_at_creation = False max_active_runs_per_dag = 16 [scheduler] job_heartbeat_sec = 5 scheduler_heartbeat_sec = 5 {code} The scheduler is run using upstart for {{monospaced}}-n 5{{monospaced}} iterations. Some symptoms: - No logs being produced by scheduler - The scheduler appears to be blocked on a long-running task - 31 of the 32 airflow child processes are listed as defunct - Killing the long-running tasks allows the scheduler to become "unstuck". The scheduler then seems to finish its final iteration, and is then respawned by upstart. Here is the output from pstree: {code} ─airflow,4984 usr/local/bin/airflow scheduler -n 5 ├─(airflow,4990) ├─(airflow,4991) ├─airflow,4992 usr/local/bin/airflow scheduler -n 5 │ └─airflow,5086 /usr/local/bin/airflow run dag_name 2016-11-09T01:20:00 --local -sd DAGS_FOLDER/dag_name.py │ └─airflow,5092 /usr/local/bin/airflow run dag_name dag_name 2016-11-09T01:20:00 --job_id 582112 --raw -sd DAGS_FOLDER/dag_name.py │ └─bash,5102 /tmp/airflowtmpOyW_H1/dag_nameRf_OMJ │ ├─sudo,5105 -u someuser node /path/to/some_script.js │ │ └─node,5107 /path/to/some_script.js │ │ ├─{node},5109 │ │ ├─{node},5110 │ │ ├─{node},5111 │ │ ├─{node},5112 │ │ ├─{node},5113 │ │ ├─{node},5114 │ │ ├─{node},5115 │ │ └─{node},5116 │ └─sudo,5106 -u someuser tee -a /var/log/some/log/file.log │ └─tee,5108 -a /var/log/some/log/file.log ├─(airflow,4993) ├─(airflow,4994) ├─(airflow,4995) ├─(airflow,4996) ├─(airflow,4997) ├─(airflow,4998) ├─(airflow,4999) ├─(airflow,5000) ├─(airflow,5001) ├─(airflow,5002) ├─(airflow,5003) ├─(airflow,5004) ├─(airflow,5005) ├─(airflow,5006) ├─(airflow,5007) ├─(airflow,5008) ├─(airflow,5009) ├─(airflow,5010) ├─(airflow,5011) ├─(airflow,5012) ├─(airflow,5013) ├─(airflow,5014) ├─(airflow,5015) ├─(airflow,5016) ├─(airflow,5017) ├─(airflow,5018) ├─(airflow,5019) ├─(airflow,5020) ├─(airflow,5021) └─{airflow},5029 {code} stracing process 4992 shows that it's waiting for the child process to terminate {{monospaced}}wait4(5086,{{monospaced}}. stracing process 4984, the root process, shows it's also waiting for some state change, presumably for the child process to change state: {{monospaced}}futex(0x7f7ac5efc000, FUTEX_WAIT, 0, NULL{{monospaced}}. Here is some more complete strace output I had from a previous time when it was hung in this state: {code} rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 futex(0x7f9eb8000ce0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x3bc5840, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x3bc5840, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9ec9141000, FUTEX_WAKE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9ec9141000, FUTEX_WAKE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9ec9141000, FUTEX_WAKE, 1) = 0 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 --- SIGCHLD (Child exited) @ 0 (0) --- --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x3bc5840, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x3bc5840, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x3bc5840, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- futex(0x7f9ec913e000, FUTEX_WAIT, 0, NULL ######################################################## # At this point I manually killed the long running task: sudo kill 25116 # ######################################################## futex(0x7f9ec913f000, FUTEX_WAKE, 1) = 1 select(7, [6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) read(6, "\0\0\0c", 4) = 4 read(6, "\200\2U!DAG0_NAME"..., 99) = 99 select(7, [6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) read(6, "\0\0\0h", 4) = 4 read(6, "\200\2U&DAG1_NAME"..., 104) = 104 select(7, [6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) read(6, "\0\0\0V", 4) = 4 read(6, "\200\2U\24DAG2_NAME"..., 86) = 86 select(7, [6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) read(6, "\0\0\0f", 4) = 4 read(6, "\200\2U%DAG3_NAME"..., 102) = 102 select(7, [6], NULL, NULL, {0, 0}) = 0 (Timeout) munmap(0x7f9ec9138000, 32) = 0 close(9) = 0 munmap(0x7f9ec913a000, 32) = 0 close(8) = 0 munmap(0x7f9ec9139000, 32) = 0 gettimeofday({1478679655, 152452}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 gettimeofday({1478679655, 153818}, NULL) = 0 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 write(3, "redacted"..., 40) = 40 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 read(3, "redacted", 5) = 5 read(3, "redacted"..., 41) = 41 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 write(3, "redacted"..., 43) = 43 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 read(3, "redacted", 5) = 5 read(3, "redacted"..., 90) = 90 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 write(3, "redacted"..., 420) = 420 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 read(3, "redacted", 5) = 5 read(3, "redacted"..., 535) = 535 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 write(3, "redacted"..., 195) = 195 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 read(3, "redacted", 5) = 5 read(3, "redacted"..., 44) = 44 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 write(3, "redacted"..., 41) = 41 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 read(3, "redacted", 5) = 5 read(3, "redacted"..., 42) = 42 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 futex(0x7f9eb8000d00, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x17aff40, FUTEX_WAKE_PRIVATE, 1) = 1 wait4(25019, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25019 wait4(25021, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25021 wait4(25030, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25030 wait4(25020, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25020 wait4(25015, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25015 wait4(25017, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25017 wait4(25006, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25006 wait4(25012, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25012 wait4(25033, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25033 wait4(25007, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25007 wait4(25008, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25008 wait4(25009, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25009 wait4(25031, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25031 wait4(25026, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25026 wait4(25013, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25013 wait4(25011, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25011 wait4(25010, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25010 wait4(25022, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25022 wait4(25018, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25018 wait4(25029, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25029 wait4(25025, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25025 wait4(25024, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25024 wait4(25027, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25027 wait4(25034, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25034 wait4(25028, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25028 wait4(25023, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25023 wait4(25014, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25014 wait4(25032, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25032 wait4(25035, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25035 wait4(25037, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25037 wait4(25036, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25036 wait4(25016, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25016 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f9ec8da2cb0}, {0x5570f0, [], SA_RESTORER, 0x7f9ec8da2cb0}, 8) = 0 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, 0x7f9ec8da2cb0}, {0x5570f0, [], SA_RESTORER, 0x7f9ec8da2cb0}, 8) = 0 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f9ec8da2cb0}, {0x5570f0, [], SA_RESTORER, 0x7f9ec8da2cb0}, 8) = 0 exit_group(0) = ? {code} > scheduler gets stuck without a trace > ------------------------------------ > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executor, scheduler > Affects Versions: Airflow 1.7.1.3 > Reporter: Nadeem Ahmed Nazeer > Assignee: Bolke de Bruin > Priority: Minor > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)