laurenhall1526-netizen commented on issue #64476:
URL: https://github.com/apache/airflow/issues/64476#issuecomment-4156736344

   Thanks for the quick look!
   Digging a bit more, it seems the crash is triggered by a 401 from the
   internal API (/execution/task-instances/.../run), which then bubbles up and
   takes down the scheduler.
   Happy to try any suggestions or test a fix.
   
   On Mon, Mar 30, 2026, 8:17 AM byteforged ***@***.***> wrote:
   
   > *byteforged-dot-com* left a comment (apache/airflow#64476)
   > <https://github.com/apache/airflow/issues/64476#issuecomment-4154975118>
   >
   >
   > *Two issues at play here — a misconfiguration and a real Airflow bug. Root
   > cause (misconfiguration):* The 401 Unauthorized on the execution API
   > endpoint is caused by a JWT authentication failure between the worker
   > subprocess and the API server. The warning in the logs —
   > InsecureKeyLengthWarning: The HMAC key is 32 bytes long — indicates the
   > secret in apiSecretKeySecretName is too short and/or not shared
   > consistently across all components (scheduler, api-server, triggerer). In
   > Airflow 3, all components must share a sufficiently long (≥ 64 bytes)
   > identical key. Regenerating the secret with secrets.token_hex(64) and
   > restarting all pods resolved this for us.
   >
   > *However, there IS a real Airflow bug here:* When the worker subprocess
   > raises an httpx.HTTPStatusError (or any other unpicklable exception) and
   > puts it into the multiprocessing.Queue, the scheduler crashes on
   > deserialization with:
   >
   > TypeError: HTTPStatusError.__init__() missing 2 required keyword-only
   > arguments: 'request' and 'response'
   >
   > This is because httpx.HTTPStatusError is not picklable. Airflow's
   > LocalExecutor should wrap any exception in a picklable container (e.g., a
   > string or a simple wrapper exception class) before putting it on the result
   > queue. The current behavior turns any unpicklable worker-side exception
   > into a full scheduler crash, which is clearly wrong — the scheduler should
   > mark the task as failed and continue running, not crash entirely.
   > The fix in local_executor.py should be somewhere around the result queue
   > put/get path — either serialize exceptions to strings before enqueueing, or
   > catch pickle errors on the get side and handle gracefully.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/airflow/issues/64476?email_source=notifications&email_token=B7IHA44TAD2BYCFSUURKACD4TJXWNA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJVGQ4TONJRGE4KM4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#issuecomment-4154975118>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/B7IHA42H2QM343JOXWP3Z6D4TJXWNAVCNFSM6AAAAACXGC6ZP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCNJUHE3TKMJRHA>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to