1fanwang opened a new pull request, #66890:
URL: https://github.com/apache/airflow/pull/66890
Two routes in the task-execution API — `PATCH /task-instances/{id}/run` and
`PATCH /task-instances/{id}/state` — catch `sqlalchemy.exc.SQLAlchemyError`
(the parent class of `DataError`) and re-raise as `HTTPException(500, "Database
error occurred")`. When a worker submits a payload with an oversized field (a
`note`, a `rendered_map_index`, an `external_executor_id`, an `extra` JSON
blob), the DB rejects with `DataError`, the local `except` fires first, and the
worker sees an opaque 500 with no signal that payload size was the cause.
That also defeats the umbrella fix in #66888: registering a global
`DataError → 413/422` handler doesn't help these two routes because their local
catch eats the exception before the global handler sees it.
Tighten both `try` blocks: catch `DataError` explicitly and re-raise
unchanged, so any handler registered on the execution API (the one shipped in
#66888, or a future replacement) can translate it. The `SQLAlchemyError → 500`
fallback for unrelated DB errors stays intact.
```python
except DataError:
# Re-raise unchanged so the app-level _DataErrorHandler can
translate to 4xx.
raise
except SQLAlchemyError:
log.exception("Error marking Task Instance state as running")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Database error occurred"
)
```
## Evidence
Verified with the same TestClient sweep used in #66888, driving real
`create_app()` routes after monkey-patching `Session.flush`/`Session.commit` to
raise the same `DataError` MySQL would raise on an oversized `note`:
```
--- PATCH /task-instances/{id}/state (oversized note) — BEFORE this PR ---
HTTP 500
body: {"message":"Internal server error"}
--- PATCH /task-instances/{id}/state (oversized note) — AFTER this PR ---
HTTP 413
body: {"detail":{"reason":"Payload exceeded database column
limit","orig_error":"(1406, \"Data too long for column 'note' at row
1\")","message":"Database rejected the payload. Reduce the field size, or your
operator may widen the column type (e.g. MEDIUMTEXT / LONGTEXT on MySQL)."}}
```
Same `DataError`, same `orig_error`; the difference is whether the local
`except SQLAlchemyError` got there first.
## Scope
Two `try/except` blocks; no behavioural change for any non-`DataError` SQL
error. Stacks on top of #66888 — independent of the global handler getting
registered, but only delivers user-visible value once the handler is in place.
Safe to merge in either order.
Closes (partially) #66889 — the umbrella issue tracking the same class of
DB-rejected-payload-as-500 failures across REST + execution API endpoints.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]