kaxil commented on issue #56647:
URL: https://github.com/apache/airflow/issues/56647#issuecomment-3409103302

   This is a database migration issue caused by invalid UTF-8 data in your XCom 
table. The migration attempts to convert `BYTEA` to `JSONB` but fails when 
encountering corrupted Unicode sequences.
   
   ## Quick Fix
   
   Run this query **before** migration to archive and remove problematic 
records:
   
   ```sql
   BEGIN;
   
   -- Create archive table for corrupted data
   CREATE TABLE IF NOT EXISTS _xcom_corrupted_archive (
       dag_run_id INTEGER NOT NULL,
       task_id VARCHAR(250) NOT NULL,
       map_index INTEGER NOT NULL,
       key VARCHAR(512) NOT NULL,
       dag_id VARCHAR(250) NOT NULL,
       run_id VARCHAR(250) NOT NULL,
       value BYTEA,
       timestamp TIMESTAMP NOT NULL,
       archived_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
       error_reason TEXT DEFAULT 'Invalid UTF-8 encoding',
       PRIMARY KEY (dag_run_id, task_id, map_index, key)
   );
   
   -- Archive corrupted records (can't be converted to valid JSON anyway)
   INSERT INTO _xcom_corrupted_archive (dag_run_id, task_id, map_index, key, 
dag_id, run_id, value, timestamp)
   SELECT dag_run_id, task_id, map_index, key, dag_id, run_id, value, timestamp
   FROM xcom
   WHERE value IS NOT NULL 
     AND get_byte(value, 0) != 128  -- Not pickled data
     AND NOT (
       -- Test UTF-8 validity
       convert_from(value, 'UTF8') IS NOT NULL
     )
   ON CONFLICT DO NOTHING;
   
   -- Delete corrupted records
   DELETE FROM xcom
   WHERE (dag_run_id, task_id, map_index, key) IN (
       SELECT dag_run_id, task_id, map_index, key FROM _xcom_corrupted_archive
   );
   
   -- Show what was archived
   SELECT COUNT(*) as archived_count FROM _xcom_corrupted_archive;
   
   COMMIT;
   ```
   
   After running this, retry `airflow db migrate`. The corrupted data is safely 
archived in `_xcom_corrupted_archive` if you need to investigate later.
   
   **Note:** Backup your database before running this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to