berglh opened a new issue, #57234:
URL: https://github.com/apache/airflow/issues/57234

   ### Apache Airflow version
   
   2.11.0
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   After waiting for the Airflow 3.1 release for the initial bugs in Airflow 3 
to be resolved, I started the upgrade from Airflow 2.11.0 deployed using the 
Helm 1.16.0 chart.
   
   ## Environment
   
   Amazon 
   
   I had followed the Airflow Uprage 3.0 guide and ran ruff to check for 
upgrade warnings ensured our DAGs were all compatible (apart from the 
airflow.sdk related class/method future deprecation notifications) with Airflow 
3.0 prior to upgrade.
   
   1. I then recieved the following error in the DB migration job pod, which 
cyclically restarted until the Helm upgrade failed in Terraform. It's possible 
this traceback was from a second or third run of the airflow DB migration pod 
executing. Essentially, the type conversion of the `value` column from `bytea` 
to `jsonb` failed.
   
   <details>
     <summary>DB Migration Pod Traceback</summary>
   
   ```
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1910, in _execute_context
       self.dialect.do_execute(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
 line 736, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type json
   DETAIL:  Token "nan" is invalid.
   CONTEXT:  JSON data, line 1: 
...2FSDH3SbQ8Q2TmlrUFDaCptskUXZOdM6bwpZShBKsJfp1"nan...
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/bin/airflow", line 7, in <module>
       sys.exit(main())
                ^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/__main__.py", line 
55, in main
       args.func(args)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/cli/cli_config.py", 
line 49, in command
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/cli.py", line 
114, in wrapper
       return f(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/providers_configuration_loader.py",
 line 54, in wrapped_function
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/cli/commands/db_command.py",
 line 207, in migratedb
       run_db_migrate_command(args, db.upgradedb, _REVISION_HEADS_MAP)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/cli/commands/db_command.py",
 line 135, in run_db_migrate_command
       command(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/session.py", 
line 100, in wrapper
       return func(*args, session=session, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/db.py", line 
1136, in upgradedb
       command.upgrade(config, revision=to_revision or "heads")
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/command.py", line 
483, in upgrade
       script.run_env()
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/script/base.py", 
line 549, in run_env
       util.load_python_file(self.dir, "env.py")
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/util/pyfiles.py", 
line 116, in load_python_file
       module = load_module_py(module_id, path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/util/pyfiles.py", 
line 136, in load_module_py
       spec.loader.exec_module(module)  # type: ignore
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "<frozen importlib._bootstrap_external>", line 999, in exec_module
     File "<frozen importlib._bootstrap>", line 488, in 
_call_with_frames_removed
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/migrations/env.py", 
line 138, in <module>
       run_migrations_online()
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/migrations/env.py", 
line 132, in run_migrations_online
       context.run_migrations()
     File "<string>", line 8, in run_migrations
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/runtime/environment.py",
 line 946, in run_migrations
       self.get_context().run_migrations(**kw)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/runtime/migration.py",
 line 627, in run_migrations
       step.migration_fn(**kw)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/migrations/versions/0049_3_0_0_remove_pickled_data_from_xcom_table.py",
 line 125, in upgrade
       op.execute(
     File "<string>", line 8, in execute
     File "<string>", line 3, in execute
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/operations/ops.py", 
line 2591, in execute
       return operations.invoke(op)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/operations/base.py", 
line 441, in invoke
       return fn(self, operation)
              ^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/operations/toimpl.py",
 line 240, in execute_sql
       operations.migration_context.impl.execute(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/ddl/impl.py", line 
253, in execute
       self._exec(sql, execution_options)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/alembic/ddl/impl.py", line 
246, in _exec
       return conn.execute(construct, params)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/future/engine.py",
 line 286, in execute
       return self._execute_20(
              ^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1710, in _execute_20
       return meth(self, args_10style, kwargs_10style, execution_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", 
line 334, in _execute_on_connection
       return connection._execute_clauseelement(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1577, in _execute_clauseelement
       ret = self._execute_context(
             ^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1953, in _execute_context
       self._handle_dbapi_exception(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 2134, in _handle_dbapi_exception
       util.raise_(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", 
line 211, in raise_
       raise exception
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1910, in _execute_context
       self.dialect.do_execute(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
 line 736, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) 
invalid input syntax for type json
   DETAIL:  Token "nan" is invalid.
   CONTEXT:  JSON data, line 1: 
...2FSDH3SbQ8Q2TmlrUFDaCptskUXZOdM6bwpZShBKsJfp1"nan...
   
   [SQL: 
               ALTER TABLE xcom
               ALTER COLUMN value TYPE JSONB
               USING CASE
                   WHEN value IS NOT NULL THEN CAST(CONVERT_FROM(value, 'UTF8') 
AS JSONB)
                   ELSE NULL
               END
               ]
   (Background on this error at: https://sqlalche.me/e/14/9h9h)
   ```
   </details>
   
   2. I then tried a variety of things, like deleting all the xcom entries 
entirely. I eventually decided to delete the entire airflow DB and recreated it 
again. After doing this, the deployment started successfully.
   3. I then tried to execute a common DAG that does the following:
      - Runs the SparkSubmitOperator to run an RDS to S3 (Parquet) job in 
Apache Spark (Kubernetes) cluster
      - Runs an Athena query to update the AWS Glue Table location in S3
      - Tracks the progress of the Athena query
      - Returns the result of the Athena query to the Airflow log
   4. The scheduler cyclically failed to submit the task with an error 
`(psycopg2.errors.StringDataRightTruncation) value too long for type character 
varying(20)` in the `callback_request` table for the `callback_type` value 
`EmailNotificationRequest` exceeding 20 characters in length.
   
   📜**Note**: Please find all the logs below from the pods that I grabbed 
before reverting to Airflow 2.11/Helm Chart Official 1.16.0.
   
   
[github-api-server-3-1.log](https://github.com/user-attachments/files/23141224/github-api-server-3-1.log)
   
[github-dag-processor-3-1.log](https://github.com/user-attachments/files/23141228/github-dag-processor-3-1.log)
   
[github-pgbouncer-3-1.log](https://github.com/user-attachments/files/23141227/github-pgbouncer-3-1.log)
   
[github-scheduler-3-1.log](https://github.com/user-attachments/files/23141226/github-scheduler-3-1.log)
   
[github-tirggerer-3-1.log](https://github.com/user-attachments/files/23141222/github-tirggerer-3-1.log)
   
[github-worker-3-1.log](https://github.com/user-attachments/files/23141225/github-worker-3-1.log)
   
[postgres-db-upgrade-error.log](https://github.com/user-attachments/files/23141223/postgres-db-upgrade-error.log)
   
   ### What you think should happen instead?
   
   1. The db migration should complete successfully.
   2. Tasks with the callback_request of `EmailNotificationRequest` should not 
fail to schedule, not sure if this is something I've missed in the upgrade 
notes.
   
   ### How to reproduce
   
   This is likely a complex issue related to the contents of my `xcom` table, I 
would be happy to perform any queries that would assist in determining the 
likely cause of the type conversion issue.
   
   Upgraded Airflow from 2.11 on Helm Official 1.16.0 -> Airflow 3.1 on Helm 
Official 1.18.0.
   
   ### Operating System
   
   Amazon Linux AL2023 (EKS v1.31)
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==9.8.0
   apache-airflow-providers-apache-spark==5.3.2
   apache-airflow-providers-celery==3.11.0
   apache-airflow-providers-cncf-kubernetes==10.8.2
   apache-airflow-providers-common-compat==1.7.0
   apache-airflow-providers-common-io==1.6.0
   apache-airflow-providers-common-sql==1.27.1
   apache-airflow-providers-docker==4.4.0
   apache-airflow-providers-elasticsearch==6.3.0
   apache-airflow-providers-fab==1.5.3
   apache-airflow-providers-ftp==3.13.0
   apache-airflow-providers-google==15.1.0
   apache-airflow-providers-grpc==3.8.0
   apache-airflow-providers-hashicorp==4.2.0
   apache-airflow-providers-http==5.3.0
   apache-airflow-providers-imap==3.9.0
   apache-airflow-providers-microsoft-azure==12.4.0
   apache-airflow-providers-mysql==6.3.0
   apache-airflow-providers-odbc==4.10.0
   apache-airflow-providers-openlineage==2.3.0
   apache-airflow-providers-postgres==6.2.0
   apache-airflow-providers-redis==4.1.0
   apache-airflow-providers-sendgrid==4.1.0
   apache-airflow-providers-sftp==5.3.0
   apache-airflow-providers-slack==9.1.0
   apache-airflow-providers-smtp==2.1.0
   apache-airflow-providers-snowflake==6.3.1
   apache-airflow-providers-sqlite==4.1.0
   apache-airflow-providers-ssh==4.1.0
   apache-airflow-providers-standard==1.9.0
   ```
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Airflow is deployed in Amazon EKS using GitLab CI/CD and Terraform.
   
   - **Database**: Aurora Postgres 16.9 (db.r8g.large)
   - **Redis**: ElastiCache Redis OSS 7.1.0 (cache.t4g.small)
   - **EKS**: v1.31
   - **Terraform**: Latest Terraform container & providers for AWS & Helm
   - DAGs & custom Python modules are mounted into Airflow pods using the Amzon 
S3 driver for EKS (dataset repositories deploy their DAGs to S3).
   - Logging is configured to log to S3
   - AWS authentication is handled by EKS IRSA via WebIdentityToken in the 
Airflow pods. I add OIDC trust relationships to the Airflow IAM role to allow 
the service accounts for the Airflow pods to assume the IAM role. Manually 
running aws s3 CLI commands in the pods showed working access to AWS resources.
   - Custom Airflow container build that does the following:
     - Installs openjdk17/aws cli
     - Bakes in our Apache Spark build for SparkSubmitOperator
     - Adds a start-up script that copies some Spark jars from S3 to the Spark 
`/opt/spark/jars` folder
     - Installs the following providers upgraded to latest: 
`apache-airflow-providers-cncf-kubernetes`, `apache-airflow-providers-standard` 
and `apache-airflow-providers-apache-spark>=5.3.2` for the fix to the Spark Job 
Operator fix for tracking multiple pods #44994
   - The Terraform configuration establishes the namespace, RDS cluster, Redis 
OSS ElastiCache, IAM role/policies/OIDC trust policy, VPC/Security Groups and 
all other dependicies for Airflow. After all the depdencies are deployed, the 
Helm provider deploys airflow to the airflow namespace using the following 
values configure (in this case for Airflow 3.1): 
[airflow-3.1-values](https://github.com/user-attachments/files/23141364/airflow-3.1-values.txt)
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to