CMStevenKusters opened a new issue, #13079:
URL: https://github.com/apache/cloudstack/issues/13079
## `keystore-setup` fails to create `agent.properties` on fresh KVM hosts;
KVM `addHost` cert flow then loops with "Failed to find keystore passphrase"
and `setupAgentSecurity()` throws "Failed to setup certificate in the KVM
agent's keystore file" (4.22)
### problem
When `LibvirtServerDiscoverer.setupAgentSecurity()` invokes
`scripts/util/keystore-setup` on a freshly-installed KVM agent (one whose
`/etc/cloudstack/agent/` directory contains only the package-shipped
`environment.properties` and `log4j-cloud.xml` - i.e. the natural state on a
first-time `addHost`, or after an operator has wiped agent state to recover
from a botched provisioning attempt), the script silently fails to create
`agent.properties` and silently fails to persist `keystore.passphrase=...` to
it. The downstream `keystore-cert-import` invocation then reads the
missing/empty `agent.properties`, exits 1 with "Failed to find keystore
passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!", and
after the 3-attempt retry loop in `setupAgentSecurity()` the management server
throws `CloudRuntimeException("Failed to setup certificate in the KVM agent's
keystore file, please see logs and configure manually!")`. The host never gets
added.
The defect is in `scripts/util/keystore-setup`. The passphrase-write block
is gated on `[ -f "$PROPS_FILE" ]`:
```bash
PROPS_FILE="$1"
KS_FILE="$2"
KS_PASS="$3"
KS_VALIDITY="$4"
CSR_FILE="$5"
...
# Re-use existing password or use the one provided
if [ -f "$PROPS_FILE" ]; then # <- gates the entire write block
OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed
's/^keystore.passphrase=//g')
if [ -n "${OLD_PASS// }" ]; then
KS_PASS="$OLD_PASS"
else
sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD
|| true
echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE" # <- never
reached if file missing
fi
fi
```
The intent appears to have been "if a previous props file exists, reuse its
passphrase; otherwise use the one provided as `$3`" - and on the otherwise
path, write `$3` into a fresh file. But the *write* is also inside the gate, so
on a fresh host the new passphrase is never persisted. The script does proceed
(it has `KS_PASS` from arg `$3` so `keytool -genkey` and `keytool -certreq`
work), so `cloud.jks` and `cloud.csr` come out fine. The trailing `chmod 600
"$PROPS_FILE"` fails with `chmod: cannot access
'/etc/cloudstack/agent/agent.properties': No such file or directory` to stderr,
but that's swallowed via `|| $LOGGER_CMD ...` and the script's final exit
status is the logger's (always 0). MS reads the CSR from stdout and proceeds.
The next stage (`keystore-cert-import`, called with all 10 args correctly
populated by `setupAgentSecurity()`) then reads passphrase from
`agent.properties`:
```bash
KS_PASS=$(sed -n '/keystore.passphrase/p' "$PROPS_FILE" 2>/dev/null | sed
's/keystore.passphrase=//g' 2>/dev/null)
if [ -z "${KS_PASS// }" ]; then
echo "Failed to find keystore passphrase from file: $PROPS_FILE,
quitting!"
exit 1
fi
```
(Note: arg `$2` `KS_PASS` is overwritten here on the KVM path - the arg `$2`
is only honoured inside the `[ ! -f "$LIBVIRTD_FILE" ]` systemvm branch above.
So the file is the only source of truth on KVM agents.)
`KS_PASS` ends up empty, script exits 1.
`setupAgentSecurity()` retries three times (each attempt deterministically
the same since the bug is reproduced from clean state on every retry), then
throws.
### evidence
`bash -x` trace of `keystore-cert-import` on the agent during a failing
`addHost`, captured by wrapping the upstream script with a tracing shim:
```
+ '[' '!' -f /etc/libvirt/libvirtd.conf ']'
++ sed -n /keystore.passphrase/p /etc/cloudstack/agent/agent.properties
++ sed s/keystore.passphrase=//g
+ KS_PASS=
+ '[' -z '' ']'
+ echo 'Failed to find keystore passphrase from file:
/etc/cloudstack/agent/agent.properties, quitting!'
+ exit 1
```
State of `/etc/cloudstack/agent/` immediately after `keystore-setup`
returned (note the missing `agent.properties` despite the script having
"succeeded"):
```
total 24
-rwxr-xr-x 1 root root 904 Nov 7 environment.properties <- package
-rwxr-xr-x 1 root root 2601 Nov 7 log4j-cloud.xml <- package
-rw------- 1 root root 2742 ... cloud.jks <- created
by keystore-setup
<-
agent.properties NOT created
<-
cloud.csr written + cat'd to stdout (consumed by MS)
```
Argument-count verification on the agent side (instrumented script logging
argc/argv on entry):
```
keystore-setup argc=5 argv=[/etc/cloudstack/agent/agent.properties
/etc/cloudstack/agent/cloud.jks <16-char-random> 365
/etc/cloudstack/agent/cloud.csr]
keystore-cert-import argc=10 arg6_len=1696 arg8_len=1814 arg10_len=0
```
So `setupAgentSecurity()` is sending all 5 / 10 args correctly - the
dispatch side is fine. The defect is purely inside
`scripts/util/keystore-setup`.
`management-server.log` for the same `addHost` attempt - three back-to-back
invocations of `keystore-cert-import`, all returning the same "Failed to find
keystore passphrase" output, then the wrapper exception:
```
... DEBUG c.c.u.s.SSHCmdHelper Executing cmd: sudo
/usr/share/cloudstack-common/scripts/util/keystore-cert-import ...
... DEBUG c.c.u.s.SSHCmdHelper SSH command output: Failed to find keystore
passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!
... [retry 2: same output]
... [retry 3: same output]
... WARN c.c.h.k.d.KvmServerDiscoverer can't setup agent, due to
com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in
the KVM agent's keystore file, please see logs and configure manually!
```
### versions
- Apache CloudStack 4.22.0.0 (apache packaging, apt install)
- Hypervisor: KVM
- Agent host OS: Ubuntu Server 24.04 LTS, OpenJDK 17
- Management server OS: Ubuntu Server 24.04 LTS, OpenJDK 17, CA framework
`provider.plugin=root`
- Agent dir before addHost: only `environment.properties` +
`log4j-cloud.xml` (i.e. the natural state right after `apt install
cloudstack-agent` on a fresh host that has not yet been registered)
### steps to reproduce
1. Install `cloudstack-management` 4.22.0.0 on a clean host. Bootstrap CA
(default - `ca.framework.provider.plugin=root` works fine for this; not the
issue).
2. Install `cloudstack-agent` 4.22.0.0 on a clean KVM host. Verify
`/etc/cloudstack/agent/` contains only `environment.properties` +
`log4j-cloud.xml` (the package-shipped files).
3. Stop `cloudstack-agent` on the KVM host (so no startup process can
pre-create state).
4. From the management server, dispatch `addHost` for the KVM host - e.g.
via `cmk add host` or `ngine_io.cloudstack.cs_host`.
5. Observe failure with `errortext: 'Could not add host at [http://...] ...
due to: [ can't setup agent, due to
com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in
the KVM agent's keystore file, please see logs and configure manually!]'`.
6. SSH to the KVM host as root and inspect:
- `/etc/cloudstack/agent/cloud.jks` exists (created by `keystore-setup`)
- `/etc/cloudstack/agent/cloud.csr` does *not* exist on disk after the
call (consumed by MS via stdout, then trailing `chmod` fails silently - this is
a separate cosmetic issue)
- `/etc/cloudstack/agent/agent.properties` does **not** exist - this is
the bug
7. Re-running `addHost` after manually creating `agent.properties` with a
`keystore.passphrase=<any-non-empty-value>` line lets the second invocation
succeed, because `keystore-setup`'s `if [ -f "$PROPS_FILE" ]` gate now passes
and the existing passphrase is reused. This is a workable manual workaround per
host.
### what to do about it
**Primary - patch `scripts/util/keystore-setup` to ensure `PROPS_FILE`
exists before the passphrase-write branch.** Two-block change, restructure the
gate so the file is created if missing, then the passphrase is always either
reused or written:
```diff
-# Re-use existing password or use the one provided
-if [ -f "$PROPS_FILE" ]; then
- $LOGGER_CMD "Previous props file exists, trying to extract password"
- OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed
's/^keystore.passphrase=//g')
- if [ -n "${OLD_PASS// }" ]; then
- KS_PASS="$OLD_PASS"
- $LOGGER_CMD "Password extraction successful"
- else
- sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD
|| true
- echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
- if [ $? != 0 ]; then
- $LOGGER_CMD "Could not add new password to agent.properties"
- else
- $LOGGER_CMD "New keystore password set"
- fi
- fi
-fi
+# Ensure props file exists, then re-use existing passphrase or persist the
one provided.
+if [ ! -f "$PROPS_FILE" ]; then
+ touch "$PROPS_FILE"
+ chmod 600 "$PROPS_FILE"
+ $LOGGER_CMD "Created new props file at $PROPS_FILE"
+fi
+
+OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed
's/^keystore.passphrase=//g')
+if [ -n "${OLD_PASS// }" ]; then
+ KS_PASS="$OLD_PASS"
+ $LOGGER_CMD "Password extraction successful"
+else
+ sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD ||
true
+ echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
+ if [ $? != 0 ]; then
+ $LOGGER_CMD "Could not add new password to agent.properties"
+ else
+ $LOGGER_CMD "New keystore password set"
+ fi
+fi
```
After this, fresh-host KVM `addHost` works first try. Re-runs are idempotent
(existing passphrase reused).
**Secondary - harden `scripts/util/keystore-cert-import` to fail loudly
instead of `exit 0` in error branches.** Same script has bare `exit` (no
status) in two error branches, which return exit code 0 and let the MS see
`SSHCmdResult.isSuccess()=true` even when nothing was imported:
```diff
# Import certificate
if [ ! -z "${CERT// }" ]; then
echo "$CERT" > "$CERT_FILE"
elif [ ! -f "$CERT_FILE" ]; then
- echo "Cannot find certificate file: $CERT_FILE, exiting"
- exit
+ echo "Cannot find certificate file: $CERT_FILE, exiting" >&2
+ exit 1
fi
# Import ca certs
if [ ! -z "${CACERT// }" ]; then
echo "$CACERT" > "$CACERT_FILE"
elif [ ! -f "$CACERT_FILE" ]; then
- echo "Cannot find ca certificate file: $CACERT_FILE, exiting!"
- exit
+ echo "Cannot find ca certificate file: $CACERT_FILE, exiting!" >&2
+ exit 1
fi
```
`exit 1` (with status) lets `SSHCmdResult.isSuccess()` correctly return
false, and the existing 3-retry-then-throw loop in `setupAgentSecurity()` will
surface the real failure to the operator instead of leaving the agent with a
broken keystore that only manifests later as runtime TLS handshake errors.
### relationship to #13064
This is a **sibling defect**, not the same one as #13064 - same script
family, same general class of silent-failure footguns, but a different trigger
and a different downstream symptom:
| | this issue | #13064 |
|---|---|---|
| Hypervisor | KVM | VMware |
| Triggered by | fresh KVM agent in `addHost` (no prior `agent.properties`)
| VMware SSH dispatch of `SetupCertificateCommand` to a SystemVM |
| Buggy script | `keystore-setup` (passphrase-write gated on file existing)
| `keystore-cert-import` (bare `exit` in error branch) |
| Args sent by MS | all 5 / 10 args correct | only 6 of 10 args (`$7`/`$8`
CACERT missing on the VMware path) |
| Exit code from script | exit 1 (loud failure) | exit 0 (silent success
masking real failure) |
| Downstream symptom | `setupAgentSecurity()` throws "Failed to setup
certificate in the KVM agent's keystore file" | agent's `cloud.jks` ends up
missing the trustedCertEntry; agent later fails TLS handshake to MS:8250 with
`certificate_unknown` |
| Surface time | immediate, on first `addHost` | minutes later, once the
agent actually tries to connect |
The proposed Secondary fix above (replacing bare `exit` with `exit 1`)
addresses one half of #13064 - once those branches return non-zero, the MS
`setupAgentSecurity()` retry-and-throw loop will surface the VMware-dispatch
arg-count bug as a proper failure instead of letting the silent-success
continue to hide it. The actual primary fix for #13064 (passing CACERT args
from the VMware SSH dispatch path) is independent of this change and orthogonal.
### operator workaround (until upstream fix lands)
Replace `/usr/share/cloudstack-common/scripts/util/keystore-setup` on each
KVM agent with a thin wrapper that pre-creates `agent.properties` with the
passphrase from arg `$3` before exec'ing the original. Originals preserved as
`.orig`:
```bash
#!/bin/bash
# Workaround for upstream keystore-setup not creating agent.properties on
# fresh hosts. See <this issue link>.
if [ $# -ge 5 ]; then
PROPS_FILE="$1"
KS_PASS="$3"
if [ ! -f "$PROPS_FILE" ]; then
touch "$PROPS_FILE"
chmod 600 "$PROPS_FILE"
fi
EXISTING=$(sed -n '/^keystore\.passphrase=/p' "$PROPS_FILE" | head -1 |
sed 's/^keystore\.passphrase=//')
if [ -z "${EXISTING// }" ]; then
echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
fi
exec /usr/share/cloudstack-common/scripts/util/keystore-setup.orig "$@"
fi
exec /usr/share/cloudstack-common/scripts/util/keystore-setup.orig "$@"
```
Verified working on Ubuntu 24.04 + CloudStack 4.22.0.0 KVM - `addHost`
succeeds first try after the wrapper is in place and the agent dir is wiped to
clean state.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]