Re: [PR] Fix/graphql client token refresh [incubator-devlake]

via GitHub Tue, 07 Apr 2026 10:35:24 -0700


zaiddialpad commented on PR #8791:
URL: 
https://github.com/apache/incubator-devlake/pull/8791#issuecomment-4200909836

Hi — just wanted to share that we're actively hitting both issues this PR
addresses in production, and this fix would be hugely impactful for us.

## Our setup

- **DevLake v1.0.3-beta9** on GKE (Helm chart)
- **GitHub App auth** across 4 connections (408 repos total)
- Largest repo: ~20K PRs, ~1,200 currently OPEN

## What we're experiencing

### 1. The `updateRateRemaining` panic (the primary fix in this PR)

Our lake pod is at **42 restarts** — almost entirely from this panic. The
sequence is:

1. The `github_graphql` "Collect Pull Requests" subtask enters Phase 2
(refetching all OPEN PRs)
2. Phase 2 sends heavy compound GraphQL queries (100 PRs per batch) that
trigger **502 Bad Gateway** from GitHub
3. The retry loop runs for 10+ retries × 120s each, easily exceeding 1 hour
4. Our GitHub App token expires (~1hr lifetime)
5. The background `updateRateRemaining` goroutine calls `/rate_limit` with
the expired token → **401 Unauthorized**
6. `panic()` at `graphql_async_client.go:129` → pod crash

```
panic: non-200 OK status code: 401 Unauthorized body: "Bad credentials"

goroutine 3660 [running]:

github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*GraphqlAsyncClient).updateRateRemaining.func1()
/app/helpers/pluginhelper/api/graphql_async_client.go:129 +0x166
```

### 2. GraphQL client token refresh (the second fix in this PR)

Since the GraphQL client uses a `StaticTokenSource` that freezes the token
at task start, any task running longer than ~1 hour with GitHub App auth is
guaranteed to eventually fail. For us, the most important repo's
`github_graphql` task takes 2-3 hours even when it succeeds, so we hit this on
every single pipeline run.

The REST client was already fixed for this via #8746, but without this PR
the GraphQL client remains vulnerable.

## Impact on us

- **Largest Repo has failed on 4 of the last 6 daily pipeline runs** due to
this combination
- Each failure wastes 15+ hours of pipeline time (retries + pod restarts +
re-queuing)
- The pod crashes affect **all** running tasks, not just the one that
triggered the panic

## Summary

This PR addresses exactly the two bugs compounding against us:
1. The panic → replaced with graceful error handling ✅
2. The static token → wired up to `RefreshRoundTripper` for automatic
renewal ✅

Would love to see this merged. Happy to help test if that would be useful —
we have a production environment that reliably reproduces both issues on every
nightly sync.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix/graphql client token refresh [incubator-devlake]

Reply via email to