AlinsRan opened a new pull request, #2790: URL: https://github.com/apache/apisix-ingress-controller/pull/2790
## Problem Profiling the e2e pipeline shows the dominant cost is **per-spec environment setup**, not teardown or in-test sleeps. Every `It` synchronously, on its critical path: ``` BeforeEach: deploy APISIX(+etcd) + ingress-controller + httpbin → block ~30-80s for readiness → run body → delete ns ``` With ~229 specs, that setup latency — paid once per spec — is the bottleneck. Trimming teardown/sleeps only shaves the tail; it does not touch the `N_specs × deploy_time` term. ## Idea Stop deploying on the critical path. Build environments **ahead of time in the background** and have `BeforeEach` pick up a ready one — overlapping the next spec's deployment with the current spec's execution (pipelining). Isolation is unchanged: each spec still gets its own namespace + controller; we only change *when* the environment is built. ## What's here - **`scaffold/envpool.go`** — generic, provider-agnostic pool. A buffered channel of depth `D` is kept full by `D` background workers (D=1 ⇒ double buffer: one ready while the next builds). `bgTestingT` is a minimal terratest `TestingT` so provisioning can run **outside a Ginkgo spec** (no `Expect`/`GinkgoT` in background goroutines); panics/failures are captured as `pooledEnv.err`. A per-process `AfterSuite` cleans up leftover envs. - **`scaffold/apisix_prewarm.go`** — error-style provisioning of the **default profile** (namespace + dataplane(+etcd) + 5 tunnels + controller + httpbin) and loading it onto the scaffold. - **`scaffold/apisix_deployer.go`** — `BeforeEach` acquires a prewarmed env; **webhook/custom profiles and any provisioning error fall back to the unchanged synchronous deploy**, so correctness is preserved. - **`framework/k8s.go`** — readiness polling fix: poll every 2s instead of an exponential backoff that polled at 7.5/15.5/31.5/63.5s — i.e. very sparsely exactly in the 10-30s window where pods become ready, wasting up to ~15s per wait. Also adds `EnsureServiceReadyE` for background endpoint waits. ## Knobs - `E2E_PREWARM` (default `true`; set `false` to disable and use the original synchronous path) - `E2E_PREWARM_DEPTH` (default `1`) ## Expected effect & limits - Steady-state per-spec cost drops from `P + B` (deploy + body) toward `max(P/D, B)`. For body-heavy specs the deploy is fully hidden; for deploy-bound specs the gain is bounded by deploy throughput. - Throughput is ultimately capped by **cluster resources**: with `E2E_NODES=N` and depth `D`, up to `N × (1 in-use + D building)` environments exist at once. The default `D=1` adds at most one in-flight env per process over today's behavior. Tune `E2E_PREWARM_DEPTH` / runner size accordingly. ## Compatibility / safety - Confined to `scaffold` + `framework`; the `Deployer` interface and spec bodies are unchanged, so downstream provider implementations reuse the pool. - Fully gated and with synchronous fallback on any failure. ## Validation - `go build ./test/e2e/...`, `go vet ./test/e2e/...`, `gofmt` — all clean. - Behavior under a live cluster (including peak resource usage and the prewarmed controller reaching Ready) needs this PR's CI run to confirm; `E2E_PREWARM=false` is the kill switch if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
