tverkade opened a new issue, #13422:
URL: https://github.com/apache/cloudstack/issues/13422
### problem
On a 4.22.1.0 deployment with two management servers behind a load balancer
VM and system-VM console access fails intermittently. The browser receives a
generic Apache-style "Internal Server Error" page. Reopening the console
sometimes works, sometimes fails, with no change to the environment.
The failure has been isolated to the console session token validation path.
Every surrounding component has been verified healthy. Pinning the browser
directly to a single management server (bypassing the load balancer) does not
resolve it, so this is not simply LB source-IP rewriting.
- Browser shows stock "Internal Server Error" (Apache-style) page
intermittently when opening a console.
- CPVM (/var/log/cloud/cloud.out) on failure:
Session <uuid> has already been used, cannot connect
External authenticator failed authentication request for vm <vm-uuid>
with sid <sid>
ERROR ConsoleProxyNoVNCHandler ... Failed to create viewer ...
com.cloud.consoleproxy.AuthenticationException: External authenticator
failed request
- Also observed earlier: org.eclipse.jetty.websocket.api.CloseException:
TimeoutException: Idle timeout expired: 300000/300000 ms.
KEY DIAGNOSTIC FINDING
The cloud.console_session table records the load balancer IP (10.125.128.38)
as
console_endpoint_creator_address for sessions, and a subset of rows never
reach acquired/removed
(these correlate with the failures):
id uuid instance host acquired removed
creator_address client_address
1414 94403a87.. 6 14 NULL NULL
172.31.1.204 NULL
1413 91842ef1.. 85 14 2026-06-15 13:40:54 2026-06-15
13:41:00 172.31.1.204 172.31.1.204
1412 ade60441.. 1902 27 2026-06-15 13:39:10 2026-06-15
13:39:20 10.125.128.38 172.31.1.204
1411 135aaa62.. 1902 27 2026-06-15 13:37:16 2026-06-15
13:39:10 10.125.128.38 172.31.1.204
1410 6cd4ac53.. 1881 27 NULL NULL
10.125.128.38 NULL
console_endpoint_creator_address is being recorded as the load balancer
(10.125.128.38) and in
some rows as the client IP (172.31.1.204) — never as a real management
server IP (.37/.39).
Neither .38 nor the client is a valid validation target, which appears to be
why those sessions
are never acquired.
On failure, NOTHING is logged on either management server
(management-server.log filtered for
console|authentication|failed|console_session shows only cluster heartbeat).
On success, the
minting MS logs the full createConsoleEndpoint -> Compose console url ->
Adding allowed session
-> ConsoleAccessAuthenticationCommand flow. So failing validations are being
rejected at the
CPVM and never reach a real MS.
RULED OUT (verified, not assumed)
- CPVM health: up 3+ days, ports 80/8001/8080 bound by the cloud Java
process (no stray Apache),
actively serving noVNC. curl of /resource/noVNC/vnc.html returns 200.
- Network/MTU: DF ping at 1472 bytes succeeds 0% loss between client and
CPVM; path MTU is full 1500.
- Management servers: both Up in mshost; both reachable on 8250 from CPVM;
clocks within ~1s
(verified date -u).
- KVM host clock: within ~1s of MS.
- VNC port: virsh vncdisplay v-1902-VM = :4 (5904), exactly matching the
port the MS hands out —
no stale-port mismatch.
- Agent version: host cloudstack-agent = 4.22.1.0-shapeblue0, matches MS.
- cluster.node.IP: correctly set per node; no rogue/extra management
instance;
- CPVM rebuild: destroyed/recreated multiple times — no effect.
- Pinning browser directly to a single MS
(https://<mgmt-server-ip>:8080/client), bypassing the LB:
still fails. (So this is NOT solely LB source-IP rewriting / cross-MS
in-memory token, despite
PR #7094 being present and the console_session table being populated.)
WHY THIS LOOKS LIKE A BUG, NOT MISCONFIGURATION
- PR #7094 (DB-backed console sessions for multi-MS) is present — the
console_session table exists
and is written.
- Yet console_endpoint_creator_address is being populated with the load
balancer IP and sometimes
the client IP, rather than the processing management server's own IP.
Those values are not valid
validation targets.
- Bypassing the load balancer (direct-to-MS) does not fix it, so source-IP
rewriting by the LB is
not a complete explanation.
- The result is intermittent, single-use-token "already used / external
authenticator failed"
rejections at the CPVM, with no corresponding log on any management server.
### versions
- CloudStack version: 4.22.1.0-shapeblue0
- Hypervisor: KVM (host agent cloudstack-agent 4.22.1.0-shapeblue0, matched
to MS)
- Management servers: 2 nodes
mshost table: both nodes Up
- Haproxy load balancers (fronts the management/UI tier; NOT a management
server)
- SSL: disabled (consoleproxy.sslEnabled=false, no consoleproxy.url.domain)
- CPVM: v-1902-VM, agent up, serving noVNC
- Client: Windows workstation
- Console settings present: console.session.cleanup.interval=180,
console.session.cleanup.retention.hours=240,
consoleproxy.session.timeout=300000, consoleproxy.session.max=50,
novnc.console.default=true, novnc.console.sourceip.check.enabled=false
### The steps to reproduce the bug
1. Deploy CloudStack 4.22.1.0 with two management servers behind a load
balancer fronting the management/UI tier. KVM hypervisor, SSL disabled.
2. From a client on a different subnet, open the CloudStack UI through the
load balancer and click "View Console" on a running VM or system VM.
3. Repeat opening the console several times.
4. Observe that console access succeeds intermittently — some attempts load
the noVNC console, others return a generic "Internal Server Error" page in the
browser.
5. On a failing attempt, check the CPVM log (/var/log/cloud/cloud.out):
Session <uuid> has already been used, cannot connect
External authenticator failed authentication request for vm <vm-uuid>
with sid <sid>
com.cloud.consoleproxy.AuthenticationException: External authenticator
failed request
6. On a failing attempt, check both management-server logs — NOTHING is
logged on either MS (no createConsoleEndpoint, no
ConsoleAccessAuthenticationCommand). On a succeeding attempt, the full flow IS
logged on the minting MS.
7. Inspect the cloud.console_session table:
SELECT
id,uuid,acquired,removed,console_endpoint_creator_address,client_address FROM
cloud.console_session ORDER BY created DESC LIMIT 10;
Note that console_endpoint_creator_address is recorded as the load
balancer IP, and in some rows the client IP — never a real management server
IP. Rows with these creator addresses are the ones that never reach
'acquired'/'removed', and these correlate with the failures.
### What to do about it?
console_endpoint_creator_address is being populated with the load balancer
IP (and sometimes the client IP) instead of the processing management server's
own management IP. Since the CPVM validates the one-time console token against
the recorded creator address, and neither the LB nor the client can service
that validation callback, those sessions are never acquired and the CPVM
rejects the connection — producing the intermittent HTTP 500.
Expected: console_endpoint_creator_address should be the management server
that processed the createConsoleEndpoint call (a real MS IP from the `host`
setting), regardless of whether the request arrived via a load balancer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]