yuzeh opened a new issue #13393:
URL: https://github.com/apache/airflow/issues/13393
**Apache Airflow version**: 2.0.0
**Kubernetes version (if you are using kubernetes)** (use `kubectl
version`): N/A
**Environment**: docker-compose
- **Cloud provider or hardware configuration**: local / gcp
- **OS** (e.g. from /etc/os-release): Ubuntu 20 on WSL 2 (local) / Ubuntu 18
(gcp)
- **Kernel** (e.g. `uname -a`): 4.19.128-microsoft-standard (local) /
5.4.0-1032-gcp (gcp)
- **Install tools**: pip (for airflow), apt (for dependencies)
- **Others**:
**What happened**:
We cannot upgrade to Airflow 2.0 while retaining our existing database
(which we have used since Airflow 1.10.11).
<details>
<summary>
After running `airflow db upgrade` and then running `airflow scheduler`, the
scheduler reports the following error and crashes.
This error doesn't show up when we set up Airflow 2.0 from a fresh db, so
I'm inclined to believe something in our Airflow DB is corrupted.
</summary>
```
webserver_1 | Traceback (most recent call last):
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/reader.py", line
50, in read_for
webserver_1 | file_path = pytzdata.tz_path(timezone)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/pytzdata/__init__.py", line 74, in
tz_path
webserver_1 | raise TimezoneNotFound('Timezone {} not found at
{}'.format(name, filepath))
webserver_1 | pytzdata.exceptions.TimezoneNotFound: Timezone
tzlocal() not found at
/usr/local/lib/python3.7/site-packages/pytzdata/zoneinfo/tzlocal()
webserver_1 |
webserver_1 | During handling of the above exception, another
exception occurred:
webserver_1 |
webserver_1 | Traceback (most recent call last):
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line
1275, in _execute
webserver_1 | self._run_scheduler_loop()
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line
1377, in _run_scheduler_loop
webserver_1 | num_queued_tis = self._do_scheduling(session)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line
1474, in _do_scheduling
webserver_1 | self._create_dag_runs(query.all(), session)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line
1557, in _create_dag_runs
webserver_1 | dag = self.dagbag.get_dag(dag_model.dag_id,
session=session)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in
wrapper
webserver_1 | return func(*args, **kwargs)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 171, in
get_dag
webserver_1 | self._add_dag_from_db(dag_id=dag_id,
session=session)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 229, in
_add_dag_from_db
webserver_1 | dag = row.dag
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/models/serialized_dag.py", line
167, in dag
webserver_1 | dag = SerializedDAG.from_dict(self.data) # type:
Any
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py",
line 719, in from_dict
webserver_1 | return cls.deserialize_dag(serialized_obj['dag'])
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py",
line 655, in deserialize_dag
webserver_1 | v = cls._deserialize_timezone(v)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/pendulum/tz/__init__.py", line 37, in
timezone
webserver_1 | tz = _Timezone(name, extended=extended)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/pendulum/tz/timezone.py", line 40, in
__init__
webserver_1 | tz = read(name, extend=extended)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/__init__.py", line
9, in read
webserver_1 | return Reader(extend=extend).read_for(name)
webserver_1 | File
"/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/reader.py", line
52, in read_for
webserver_1 | raise InvalidTimezone(timezone)
webserver_1 | pendulum.tz.zoneinfo.exceptions.InvalidTimezone:
Invalid timezone "tzlocal()"
webserver_1 | [2020-12-30 08:55:39,257] {{settings.py:52}} INFO -
Configured default timezone Timezone('UTC')
```
</details>
**How to reproduce it**:
I've been able to reproduce this on two systems by using the same DB backup,
but cannot share the DB backup as it contains confidential information.
Any thoughts on how to develop a minimal reproducible test case would be
appreciated!
**Anything else we need to know**:
We use some of the maintenance DAGs in this repo
(https://github.com/teamclairvoyant/airflow-maintenance-dags), which directly
edits the Airflow Metadata DB. I suspect that something that one of these DAGs
did may have corrupted our Airflow DB.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]