yuzeh opened a new issue #13393:
URL: https://github.com/apache/airflow/issues/13393


   **Apache Airflow version**: 2.0.0
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): N/A
   
   **Environment**: docker-compose
   
   - **Cloud provider or hardware configuration**: local / gcp
   - **OS** (e.g. from /etc/os-release): Ubuntu 20 on WSL 2 (local) / Ubuntu 18 
(gcp)
   - **Kernel** (e.g. `uname -a`): 4.19.128-microsoft-standard (local) / 
5.4.0-1032-gcp (gcp)
   - **Install tools**: pip (for airflow), apt (for dependencies)
   - **Others**:
   
   **What happened**:
   We cannot upgrade to Airflow 2.0 while retaining our existing database 
(which we have used since Airflow 1.10.11).
   
   <details>
   <summary>
   
   After running `airflow db upgrade` and then running `airflow scheduler`, the 
scheduler reports the following error and crashes.
   
   This error doesn't show up when we set up Airflow 2.0 from a fresh db, so 
I'm inclined to believe something in our Airflow DB is corrupted.
   
   </summary>
   
   ```
   webserver_1        | Traceback (most recent call last):
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/reader.py", line 
50, in read_for
   webserver_1        |     file_path = pytzdata.tz_path(timezone)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/pytzdata/__init__.py", line 74, in 
tz_path
   webserver_1        |     raise TimezoneNotFound('Timezone {} not found at 
{}'.format(name, filepath))
   webserver_1        | pytzdata.exceptions.TimezoneNotFound: Timezone 
tzlocal() not found at 
/usr/local/lib/python3.7/site-packages/pytzdata/zoneinfo/tzlocal()
   webserver_1        |
   webserver_1        | During handling of the above exception, another 
exception occurred:
   webserver_1        |
   webserver_1        | Traceback (most recent call last):
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 
1275, in _execute
   webserver_1        |     self._run_scheduler_loop()
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 
1377, in _run_scheduler_loop
   webserver_1        |     num_queued_tis = self._do_scheduling(session)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 
1474, in _do_scheduling
   webserver_1        |     self._create_dag_runs(query.all(), session)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 
1557, in _create_dag_runs
   webserver_1        |     dag = self.dagbag.get_dag(dag_model.dag_id, 
session=session)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in 
wrapper
   webserver_1        |     return func(*args, **kwargs)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 171, in 
get_dag
   webserver_1        |     self._add_dag_from_db(dag_id=dag_id, 
session=session)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 229, in 
_add_dag_from_db
   webserver_1        |     dag = row.dag
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/models/serialized_dag.py", line 
167, in dag
   webserver_1        |     dag = SerializedDAG.from_dict(self.data)  # type: 
Any
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py",
 line 719, in from_dict
   webserver_1        |     return cls.deserialize_dag(serialized_obj['dag'])
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py",
 line 655, in deserialize_dag
   webserver_1        |     v = cls._deserialize_timezone(v)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/pendulum/tz/__init__.py", line 37, in 
timezone
   webserver_1        |     tz = _Timezone(name, extended=extended)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/pendulum/tz/timezone.py", line 40, in 
__init__
   webserver_1        |     tz = read(name, extend=extended)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/__init__.py", line 
9, in read
   webserver_1        |     return Reader(extend=extend).read_for(name)
   webserver_1        |   File 
"/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/reader.py", line 
52, in read_for
   webserver_1        |     raise InvalidTimezone(timezone)
   webserver_1        | pendulum.tz.zoneinfo.exceptions.InvalidTimezone: 
Invalid timezone "tzlocal()"
   webserver_1        | [2020-12-30 08:55:39,257] {{settings.py:52}} INFO - 
Configured default timezone Timezone('UTC')
   ```
   </details>
   
   **How to reproduce it**:
   
   I've been able to reproduce this on two systems by using the same DB backup, 
but cannot share the DB backup as it contains confidential information.
   
   Any thoughts on how to develop a minimal reproducible test case would be 
appreciated!
   
   **Anything else we need to know**:
   
   We use some of the maintenance DAGs in this repo 
(https://github.com/teamclairvoyant/airflow-maintenance-dags), which directly 
edits the Airflow Metadata DB. I suspect that something that one of these DAGs 
did may have corrupted our Airflow DB.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to