gbonazzoli opened a new issue #19343:
URL: https://github.com/apache/airflow/issues/19343


   ### Apache Airflow version
   
   2.2.1 (latest released)
   
   ### Operating System
   
   Ubuntu 20.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-celery==2.1.0
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-microsoft-mssql==2.0.1
   apache-airflow-providers-microsoft-winrm==2.0.1
   apache-airflow-providers-openfaas==2.0.0
   apache-airflow-providers-oracle==2.0.1
   apache-airflow-providers-samba==3.0.0
   apache-airflow-providers-sftp==2.1.1
   apache-airflow-providers-sqlite==2.0.1
   apache-airflow-providers-ssh==2.2.0
   ```
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   Airflow 2.2.1 on a LXD Container "all in one" (web, scheduler, database == 
postgres)
   
   ### What happened
   
   I don't know if it is related to the change of the time we had in Italy from 
03:00 to 02:00 happened on October 30th.
   
   The result is that during the same day the scheduler has been compromised.
   
   The output of the commad `airflow scheduler` is:
   
   ```
   root@new-airflow:~/airflow# airflow scheduler
     ____________       _____________
    ____    |__( )_________  __/__  /________      __
   ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
   ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
    _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   [2021-11-01 02:00:41,181] {scheduler_job.py:596} INFO - Starting the 
scheduler
   [2021-11-01 02:00:41,181] {scheduler_job.py:601} INFO - Processing each file 
at most -1 times
   [2021-11-01 02:00:41,267] {manager.py:163} INFO - Launched 
DagFileProcessorManager with pid: 12284
   [2021-11-01 02:00:41,268] {scheduler_job.py:1115} INFO - Resetting orphaned 
tasks for active dag runs
   [2021-11-01 02:00:41,269] {settings.py:52} INFO - Configured default 
timezone Timezone('UTC')
   [2021-11-01 02:00:41,332] {celery_executor.py:493} INFO - Adopted the 
following 7 tasks from a dead executor
        <TaskInstance: 
EXEC_SAVE_ORACLE_SOURCE.PSOFA_PSO_PKG_UTILITY_SP_SAVE_ORACLE_SOURCE 
scheduled__2021-10-30T17:30:00+00:00 [running]> in state STARTED
        <TaskInstance: EXEC_MAIL_VENDUTO_LIMASTE-BALLETTA.PSO_SP_DATI_BB_V4 
scheduled__2021-10-30T19:00:00+00:00 [running]> in state STARTED
        <TaskInstance: EXEC_MAIL_TASSO_CONV.PSO_SP_DATI_BB_V6 
scheduled__2021-10-30T20:35:00+00:00 [running]> in state STARTED
        <TaskInstance: EXEC_MAIL_VENDUTO_UNICA_AM.PSO_SP_DATI_BB_V6 
scheduled__2021-10-30T19:20:00+00:00 [running]> in state STARTED
        <TaskInstance: EXEC_BI_ASYNC.bi_pkg_batch_carica_async_2 
scheduled__2021-10-30T23:00:00+00:00 [running]> in state STARTED
        <TaskInstance: EXEC_MAIL_INGRESSI_UNICA.PSO_SP_INGRESSI_BB_V4 
scheduled__2021-10-30T20:15:00+00:00 [running]> in state STARTED
        <TaskInstance: API_REFRESH_PSO_ANALISI_CONS_ORDINE_EXCEL.Refresh_Table 
scheduled__2021-10-31T07:29:00+00:00 [running]> in state STARTED
   [2021-11-01 02:00:41,440] {dagrun.py:511} INFO - Marking run <DagRun 
EXEC_CALCOLO_FILTRO_RR_INCREMENTALE @ 2021-10-30 18:00:00+00:00: 
scheduled__2021-10-30T18:00:00+00:00, externally triggered: False> successful
   [2021-11-01 02:00:41,441] {dagrun.py:556} INFO - DagRun Finished: 
dag_id=EXEC_CALCOLO_FILTRO_RR_INCREMENTALE, execution_date=2021-10-30 
18:00:00+00:00, run_id=scheduled__2021-10-30T18:00:00+00:00, 
run_start_date=2021-10-31 09:00:00.440704+00:00, run_end_date=2021-11-01 
01:00:41.441139+00:00, run_duration=57641.000435, state=success, 
external_trigger=False, run_type=scheduled, data_interval_start=2021-10-30 
18:00:00+00:00, data_interval_end=2021-10-31 09:00:00+00:00, 
dag_hash=91db1a3fa29d7dba470ee53feddb124b
   [2021-11-01 02:00:41,444] {scheduler_job.py:644} ERROR - Exception when 
executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
628, in _execute
       self._run_scheduler_loop()
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
1044, in _schedule_dag_run
       self._update_dag_next_dagruns(dag, dag_model, active_runs)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
935, in _update_dag_next_dagruns
       data_interval = dag.get_next_data_interval(dag_model)
     File "/usr/local/lib/python3.8/dist-packages/airflow/models/dag.py", line 
629, in get_next_data_interval
       return self.infer_automated_data_interval(dag_model.next_dagrun)
     File "/usr/local/lib/python3.8/dist-packages/airflow/models/dag.py", line 
667, in infer_automated_data_interval
       end = cast(CronDataIntervalTimetable, self.timetable)._get_next(start)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/timetables/interval.py", line 
171, in _get_next
       naive = make_naive(current, self._timezone)
     File "/usr/local/lib/python3.8/dist-packages/airflow/utils/timezone.py", 
line 143, in make_naive
       if is_naive(value):
     File "/usr/local/lib/python3.8/dist-packages/airflow/utils/timezone.py", 
line 50, in is_naive
       return value.utcoffset() is None
   AttributeError: 'NoneType' object has no attribute 'utcoffset'
   [2021-11-01 02:00:42,459] {process_utils.py:100} INFO - Sending 
Signals.SIGTERM to GPID 12284
   [2021-11-01 02:00:42,753] {process_utils.py:212} INFO - Waiting up to 5 
seconds for processes to exit...
   [2021-11-01 02:00:42,792] {process_utils.py:66} INFO - Process 
psutil.Process(pid=12342, status='terminated', started='02:00:41') (12342) 
terminated with exit code None
   [2021-11-01 02:00:42,792] {process_utils.py:66} INFO - Process 
psutil.Process(pid=12284, status='terminated', exitcode=0, started='02:00:40') 
(12284) terminated with exit code 0
   [2021-11-01 02:00:42,792] {process_utils.py:66} INFO - Process 
psutil.Process(pid=12317, status='terminated', started='02:00:41') (12317) 
terminated with exit code None
   [2021-11-01 02:00:42,792] {scheduler_job.py:655} INFO - Exited execute loop
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.8/dist-packages/airflow/__main__.py", line 
48, in main
       args.func(args)
     File "/usr/local/lib/python3.8/dist-packages/airflow/cli/cli_parser.py", 
line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.8/dist-packages/airflow/utils/cli.py", line 
92, in wrapper
       return f(*args, **kwargs)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/cli/commands/scheduler_command.py",
 line 75, in scheduler
       _run_scheduler_job(args=args)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/cli/commands/scheduler_command.py",
 line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.8/dist-packages/airflow/jobs/base_job.py", 
line 245, in run
       self._execute()
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
628, in _execute
       self._run_scheduler_loop()
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
1044, in _schedule_dag_run
       self._update_dag_next_dagruns(dag, dag_model, active_runs)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/jobs/scheduler_job.py", line 
935, in _update_dag_next_dagruns
       data_interval = dag.get_next_data_interval(dag_model)
     File "/usr/local/lib/python3.8/dist-packages/airflow/models/dag.py", line 
629, in get_next_data_interval
       return self.infer_automated_data_interval(dag_model.next_dagrun)
     File "/usr/local/lib/python3.8/dist-packages/airflow/models/dag.py", line 
667, in infer_automated_data_interval
       end = cast(CronDataIntervalTimetable, self.timetable)._get_next(start)
     File 
"/usr/local/lib/python3.8/dist-packages/airflow/timetables/interval.py", line 
171, in _get_next
       naive = make_naive(current, self._timezone)
     File "/usr/local/lib/python3.8/dist-packages/airflow/utils/timezone.py", 
line 143, in make_naive
       if is_naive(value):
     File "/usr/local/lib/python3.8/dist-packages/airflow/utils/timezone.py", 
line 50, in is_naive
       return value.utcoffset() is None
   AttributeError: 'NoneType' object has no attribute 'utcoffset'
   ```
   There was non way to have Airflow started !!
   
   I restored the backup of the day before in order to have Airflow up and 
running again.
   
   Now it works, but at the startup Airflow launched all the jobs he thinks 
were non executed, causing some problems on the database due to these unusual 
load.
   
   Is there a way to avoid this behaviour at the startup ?
   
   ### What you expected to happen
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to