[
https://issues.apache.org/jira/browse/SPARK-57425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088838#comment-18088838
]
Shrirang Mhalgi edited comment on SPARK-57425 at 6/14/26 4:48 AM:
------------------------------------------------------------------
I would like to work on this
was (Author: JIRAUSER313104):
Id like to work on this
> Reattach iterator cannot recover when short-TTL credentials expire mid-stream
> -----------------------------------------------------------------------------
>
> Key: SPARK-57425
> URL: https://issues.apache.org/jira/browse/SPARK-57425
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 4.1.0, 4.0.0, 4.2.0, 5.0.0
> Reporter: Daisuke Taniwaki
> Priority: Major
>
> `ExecutePlanResponseReattachableIterator`
> (`python/pyspark/sql/connect/client/reattach.py`) has a reattach mechanism
> designed to recover when the underlying gRPC stream is broken before
> `ResultComplete`. That recovery is structurally impossible when the
> server enforces a short auth-token TTL (e.g. AWS Athena Spark, 30 min):
> 1. `ExecutePlan` is started with a fresh credential.
> 2. The query runs past the TTL; the server kills the stream with
> `PERMISSION_DENIED`.
> 3. The default retry policy does not treat `PERMISSION_DENIED` as
> retryable, so the iterator never even attempts to reattach.
> 4. Even if reattach were attempted, `self._metadata` still holds the
> expired token captured at `__init__`, so it would immediately fail
> with the same 403.
> The iterator's own contract ("recover from broken stream") is violated
> for any deployment that combines short token TTLs with long-running
> streams. Both gaps must be fixed for the reattach machinery to do what
> it was designed to do.
> This has not surfaced in typical deployments because four conditions
> must align (short server TTL, a stream that outlives it, a server that
> actively kills the stream on expiry, and reattach firing). Local dev
> without auth, on-prem with long-lived tokens, and short ad-hoc queries
> each violate at least one. Managed federated-credential environments
> hit all four; Athena Spark Connect with its 30-minute auth token is the
> canonical trigger.
> The dbt-athena Spark adapter ships runtime monkey-patches today as a
> verified workaround. They have been in production use long enough to
> confirm the behaviour is safe. The fix here folds the moving parts into
> upstream so the workaround becomes unnecessary.
> Backport requested to branch-4.0, branch-4.1, branch-4.2 — 4.x is what
> managed environments actually run.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]