GayathriSrividya opened a new pull request, #68054:
URL: https://github.com/apache/airflow/pull/68054

   closes: #67939
   
   ## Problem
   
   Long-running tasks fail with repeated 403 errors when a heartbeat arrives 
with a JWT token that has only milliseconds of validity left. The race:
   
   1. `JWTBearer` validates the token — still valid at that instant
   2. Handler completes (heartbeat → 200)
   3. `JWTReissueMiddleware` calls `avalidated_claims` a **second time** on the 
same token
   4. Token has now crossed its expiry boundary → `ExpiredSignatureError`
   5. Exception is swallowed, no `Refreshed-API-Token` header is set
   6. Client's next heartbeat (30 s later) arrives with a fully-expired token → 
403
   7. After `MAX_FAILED_HEARTBEATS` the supervisor kills the task
   
   The race is rare (only triggers when a heartbeat lands with < 1 s left on 
the token), which explains why tasks run fine for ~100 minutes and then 
suddenly die.
   
   ## Fix
   
   `JWTBearer` already validates the token and caches the resulting `TIToken` 
on `request.scope`. `JWTReissueMiddleware` now reads directly from 
`request.scope` instead of re-parsing the `Authorization` header and calling 
`avalidated_claims` again. This avoids a second validation pass and does not 
add any expiry leeway.
   
   If the token crosses its expiry boundary during request processing, 
`valid_left` will be ≤ `refresh_when_less_than` and a fresh token is still 
issued from the already-validated claims. If the token crosses its expiry 
boundary during request processing, `valid_left` will be ≤ 
`refresh_when_less_than`, so a fresh token can still be issued from the claims 
that were already validated for this request. The reissued token uses the same 
`sub`, `scope`, and `ti_id`, and no expiry leeway is added.
   
   
   ## Changes
   
   - `app.py`: read `TIToken` from `request.scope[_REQUEST_SCOPE_TOKEN_KEY]` 
(cached by `JWTBearer`) instead of re-parsing the `Authorization` header and 
calling `avalidated_claims` a second time
   - `test_router.py`: assert `avalidated_claims` is called exactly once per 
request; add a regression test verifying that the middleware reissues from 
cached `JWTBearer` claims without re-validating the token


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to