The GitHub Actions job "Tests" on airflow.git/cache-revoked-token-is-revoked has failed. Run started by GitHub user antonlin1 (triggered by antonlin1).
Head commit for run: c4a8c1dde8ed0c3614a76df3dd19334fa832e53a / Anton Lin <[email protected]> Cache RevokedToken.is_revoked to avoid per-request DB roundtrip Since 3.2 (b3306f15cd, "AIP-84: Add JWT token revokation for logout invalidation"), every authenticated API request runs a synchronous ``RevokedToken.is_revoked(jti)`` DB query inside the FastAPI auth dependency. The query is dispatched via ``@provide_session`` which checks out a SQLAlchemy connection per in-flight request. With the default pool of ``5+10=15`` shared across api-server, scheduler, dag-processor, and triggerer, modest concurrent load (UI multi-endpoint polling, fan-out DAGs) exhausts the pool and request handlers time out in ``QueuePool._do_get`` after 30 s. Observed locally as:: sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 Cache hit rate on ``is_revoked`` is ≈100% in practice — revocation only happens on explicit logout. Wrap ``is_revoked`` in a process-local ``cachetools.TTLCache`` (existing dep) guarded by an ``RLock``, mirroring the ``DBDagBag`` pattern at ``airflow/models/dagbag.py``: cache lookup first, double-checked locking on miss, ``Stats.cache_hit/cache_miss`` metrics, and a public ``clear_cache()`` for operators. ``revoke()`` populates the local cache on success so the worker that processes a logout is immediately consistent. Two new ``[api_auth]`` config keys (``revoked_token_cache_size`` default 10000, ``revoked_token_cache_ttl_seconds`` default 60) make the cache tunable; setting either to 0 disables caching and reverts to the per-request DB query behavior. Trade-offs: * uvicorn workers don't share memory, so a logout on worker A is not immediately reflected on worker B — worker B serves the cached pre-logout result for up to ``revoked_token_cache_ttl_seconds`` seconds. Operators needing strict cross-worker logout consistency can reduce or zero out the TTL. * Expired JWTs are rejected by ``avalidated_claims`` (PyJWT ``exp`` check) before ``is_revoked`` runs, so cached entries cannot leak past the token's natural lifetime. * ``_maybe_cleanup_expired`` is called BEFORE the cache lookup so the periodic TTL sweep keeps running even when most calls are cache hits. Local before/after with default 3.2.0, ``pool_size=3``, ``max_overflow=2``, 60 requests at concurrency 30 against ``GET /api/v2/dags``: ================ ======== ========== ============ Metric STOCK CACHE_FIX improvement ================ ======== ========== ============ Wall time 31.0 s 1.04 s ~30x Success rate 88% 100% +12pp Pool timeouts 7/60 0 gone Latency p50 15.3 s 0.51 s ~30x Latency p95 30.5 s 0.57 s ~53x Latency p99 30.7 s 0.60 s ~51x ================ ======== ========== ============ Eight new unit tests in ``tests/unit/models/test_revoked_token.py`` cover both polarities of caching, TTL/size opt-out, ``revoke()`` cache population, the cleanup-still-runs-on-cache-hit invariant, ``clear_cache``, and the ``revoke()`` no-op-on-merge-failure path. Report URL: https://github.com/apache/airflow/actions/runs/25461235521 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
