berglh opened a new issue, #57234:
URL: https://github.com/apache/airflow/issues/57234
### Apache Airflow version
2.11.0
### If "Other Airflow 2/3 version" selected, which one?
_No response_
### What happened?
After waiting for the Airflow 3.1 release for the initial bugs in Airflow 3
to be resolved, I started the upgrade from Airflow 2.11.0 deployed using the
Helm 1.16.0 chart.
## Environment
Amazon
I had followed the Airflow Uprage 3.0 guide and ran ruff to check for
upgrade warnings ensured our DAGs were all compatible (apart from the
airflow.sdk related class/method future deprecation notifications) with Airflow
3.0 prior to upgrade.
1. I then recieved the following error in the DB migration job pod, which
cyclically restarted until the Helm upgrade failed in Terraform. It's possible
this traceback was from a second or third run of the airflow DB migration pod
executing. Essentially, the type conversion of the `value` column from `bytea`
to `jsonb` failed.
<details>
<summary>DB Migration Pod Traceback</summary>
```
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1910, in _execute_context
self.dialect.do_execute(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
line 736, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type json
DETAIL: Token "nan" is invalid.
CONTEXT: JSON data, line 1:
...2FSDH3SbQ8Q2TmlrUFDaCptskUXZOdM6bwpZShBKsJfp1"nan...
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 7, in <module>
sys.exit(main())
^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/__main__.py", line
55, in main
args.func(args)
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/cli/cli_config.py",
line 49, in command
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/cli.py", line
114, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/providers_configuration_loader.py",
line 54, in wrapped_function
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/cli/commands/db_command.py",
line 207, in migratedb
run_db_migrate_command(args, db.upgradedb, _REVISION_HEADS_MAP)
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/cli/commands/db_command.py",
line 135, in run_db_migrate_command
command(
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/session.py",
line 100, in wrapper
return func(*args, session=session, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/db.py", line
1136, in upgradedb
command.upgrade(config, revision=to_revision or "heads")
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/command.py", line
483, in upgrade
script.run_env()
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/script/base.py",
line 549, in run_env
util.load_python_file(self.dir, "env.py")
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/util/pyfiles.py",
line 116, in load_python_file
module = load_module_py(module_id, path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/util/pyfiles.py",
line 136, in load_module_py
spec.loader.exec_module(module) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap_external>", line 999, in exec_module
File "<frozen importlib._bootstrap>", line 488, in
_call_with_frames_removed
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/migrations/env.py",
line 138, in <module>
run_migrations_online()
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/migrations/env.py",
line 132, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/runtime/environment.py",
line 946, in run_migrations
self.get_context().run_migrations(**kw)
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/runtime/migration.py",
line 627, in run_migrations
step.migration_fn(**kw)
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/migrations/versions/0049_3_0_0_remove_pickled_data_from_xcom_table.py",
line 125, in upgrade
op.execute(
File "<string>", line 8, in execute
File "<string>", line 3, in execute
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/operations/ops.py",
line 2591, in execute
return operations.invoke(op)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/operations/base.py",
line 441, in invoke
return fn(self, operation)
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/operations/toimpl.py",
line 240, in execute_sql
operations.migration_context.impl.execute(
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/ddl/impl.py", line
253, in execute
self._exec(sql, execution_options)
File
"/home/airflow/.local/lib/python3.12/site-packages/alembic/ddl/impl.py", line
246, in _exec
return conn.execute(construct, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/future/engine.py",
line 286, in execute
return self._execute_20(
^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1710, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py",
line 334, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1577, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1953, in _execute_context
self._handle_dbapi_exception(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 2134, in _handle_dbapi_exception
util.raise_(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py",
line 211, in raise_
raise exception
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1910, in _execute_context
self.dialect.do_execute(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
line 736, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation)
invalid input syntax for type json
DETAIL: Token "nan" is invalid.
CONTEXT: JSON data, line 1:
...2FSDH3SbQ8Q2TmlrUFDaCptskUXZOdM6bwpZShBKsJfp1"nan...
[SQL:
ALTER TABLE xcom
ALTER COLUMN value TYPE JSONB
USING CASE
WHEN value IS NOT NULL THEN CAST(CONVERT_FROM(value, 'UTF8')
AS JSONB)
ELSE NULL
END
]
(Background on this error at: https://sqlalche.me/e/14/9h9h)
```
</details>
2. I then tried a variety of things, like deleting all the xcom entries
entirely. I eventually decided to delete the entire airflow DB and recreated it
again. After doing this, the deployment started successfully.
3. I then tried to execute a common DAG that does the following:
- Runs the SparkSubmitOperator to run an RDS to S3 (Parquet) job in
Apache Spark (Kubernetes) cluster
- Runs an Athena query to update the AWS Glue Table location in S3
- Tracks the progress of the Athena query
- Returns the result of the Athena query to the Airflow log
4. The scheduler cyclically failed to submit the task with an error
`(psycopg2.errors.StringDataRightTruncation) value too long for type character
varying(20)` in the `callback_request` table for the `callback_type` value
`EmailNotificationRequest` exceeding 20 characters in length.
📜**Note**: Please find all the logs below from the pods that I grabbed
before reverting to Airflow 2.11/Helm Chart Official 1.16.0.
[github-api-server-3-1.log](https://github.com/user-attachments/files/23141224/github-api-server-3-1.log)
[github-dag-processor-3-1.log](https://github.com/user-attachments/files/23141228/github-dag-processor-3-1.log)
[github-pgbouncer-3-1.log](https://github.com/user-attachments/files/23141227/github-pgbouncer-3-1.log)
[github-scheduler-3-1.log](https://github.com/user-attachments/files/23141226/github-scheduler-3-1.log)
[github-tirggerer-3-1.log](https://github.com/user-attachments/files/23141222/github-tirggerer-3-1.log)
[github-worker-3-1.log](https://github.com/user-attachments/files/23141225/github-worker-3-1.log)
[postgres-db-upgrade-error.log](https://github.com/user-attachments/files/23141223/postgres-db-upgrade-error.log)
### What you think should happen instead?
1. The db migration should complete successfully.
2. Tasks with the callback_request of `EmailNotificationRequest` should not
fail to schedule, not sure if this is something I've missed in the upgrade
notes.
### How to reproduce
This is likely a complex issue related to the contents of my `xcom` table, I
would be happy to perform any queries that would assist in determining the
likely cause of the type conversion issue.
Upgraded Airflow from 2.11 on Helm Official 1.16.0 -> Airflow 3.1 on Helm
Official 1.18.0.
### Operating System
Amazon Linux AL2023 (EKS v1.31)
### Versions of Apache Airflow Providers
```
apache-airflow-providers-amazon==9.8.0
apache-airflow-providers-apache-spark==5.3.2
apache-airflow-providers-celery==3.11.0
apache-airflow-providers-cncf-kubernetes==10.8.2
apache-airflow-providers-common-compat==1.7.0
apache-airflow-providers-common-io==1.6.0
apache-airflow-providers-common-sql==1.27.1
apache-airflow-providers-docker==4.4.0
apache-airflow-providers-elasticsearch==6.3.0
apache-airflow-providers-fab==1.5.3
apache-airflow-providers-ftp==3.13.0
apache-airflow-providers-google==15.1.0
apache-airflow-providers-grpc==3.8.0
apache-airflow-providers-hashicorp==4.2.0
apache-airflow-providers-http==5.3.0
apache-airflow-providers-imap==3.9.0
apache-airflow-providers-microsoft-azure==12.4.0
apache-airflow-providers-mysql==6.3.0
apache-airflow-providers-odbc==4.10.0
apache-airflow-providers-openlineage==2.3.0
apache-airflow-providers-postgres==6.2.0
apache-airflow-providers-redis==4.1.0
apache-airflow-providers-sendgrid==4.1.0
apache-airflow-providers-sftp==5.3.0
apache-airflow-providers-slack==9.1.0
apache-airflow-providers-smtp==2.1.0
apache-airflow-providers-snowflake==6.3.1
apache-airflow-providers-sqlite==4.1.0
apache-airflow-providers-ssh==4.1.0
apache-airflow-providers-standard==1.9.0
```
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Airflow is deployed in Amazon EKS using GitLab CI/CD and Terraform.
- **Database**: Aurora Postgres 16.9 (db.r8g.large)
- **Redis**: ElastiCache Redis OSS 7.1.0 (cache.t4g.small)
- **EKS**: v1.31
- **Terraform**: Latest Terraform container & providers for AWS & Helm
- DAGs & custom Python modules are mounted into Airflow pods using the Amzon
S3 driver for EKS (dataset repositories deploy their DAGs to S3).
- Logging is configured to log to S3
- AWS authentication is handled by EKS IRSA via WebIdentityToken in the
Airflow pods. I add OIDC trust relationships to the Airflow IAM role to allow
the service accounts for the Airflow pods to assume the IAM role. Manually
running aws s3 CLI commands in the pods showed working access to AWS resources.
- Custom Airflow container build that does the following:
- Installs openjdk17/aws cli
- Bakes in our Apache Spark build for SparkSubmitOperator
- Adds a start-up script that copies some Spark jars from S3 to the Spark
`/opt/spark/jars` folder
- Installs the following providers upgraded to latest:
`apache-airflow-providers-cncf-kubernetes`, `apache-airflow-providers-standard`
and `apache-airflow-providers-apache-spark>=5.3.2` for the fix to the Spark Job
Operator fix for tracking multiple pods #44994
- The Terraform configuration establishes the namespace, RDS cluster, Redis
OSS ElastiCache, IAM role/policies/OIDC trust policy, VPC/Security Groups and
all other dependicies for Airflow. After all the depdencies are deployed, the
Helm provider deploys airflow to the airflow namespace using the following
values configure (in this case for Airflow 3.1):
[airflow-3.1-values](https://github.com/user-attachments/files/23141364/airflow-3.1-values.txt)
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]