lhotari commented on PR #686:
URL:
https://github.com/apache/pulsar-helm-chart/pull/686#issuecomment-4475171030
Thanks for the clarification @smbecker. I dug into the templates and I think
the PR is half-right — it correctly drops `*.` on the regular service line, but
the wildcard on the **headless** service line is still needed. Some details
below.
### Why the wildcard is still needed on the headless line
TLS wildcards (RFC 6125) match only one DNS label, so `*.foo.bar` matches
`x.foo.bar` but **not** `foo.bar` itself. That's why your test passed once `*.`
was removed from the regular service line — but the same rule means the
wildcard on the headless line is covering a different set of names entirely:
- `broker-statefulset.yaml:33` uses `serviceName: {{
pulsar.broker.service.headless }}`. StatefulSet pods get DNS records of the
form `<pod>.<release>-broker-headless.<ns>.svc.<clusterDomain>` (e.g.
`pulsar-ci-broker-0.pulsar-ci-broker-headless.default.svc.cluster.local`).
These pod-level FQDNs are what brokers advertise to each other and to
clients/proxies during lookups, so they need to match the SAN.
- Same story for zookeeper StatefulSet via `zookeeper-headless-service`.
- `standalone-deployment.yaml:90-91,158` sets `hostname: standalone` +
`subdomain: {{ pulsar.standalone.service.headless }}` and runs
`--advertised-address "$(hostname -f)"`, which expands to
`standalone.<release>-standalone-headless.<ns>.svc.<clusterDomain>` — again a
pod-level FQDN under the headless service.
Without `*.<release>-<component>-headless.<ns>.svc.<clusterDomain>` in the
SAN, TLS hostname verification will fail for any client (or peer broker) that
uses the advertised/pod FQDN. The local test in this PR connects to the
**regular** service FQDN, which doesn't exercise this path — that's why it
looks OK locally.
### Suggested change
```yaml
{{- if or (eq .componentConfig.component "broker") (eq
.componentConfig.component "zookeeper") }}
- {{ printf "*.%s-%s-headless.%s.svc.%s" ... | quote }} # keep — pod FQDNs
- {{ printf "%s-%s-headless.%s.svc.%s" ... | quote }} # optional — bare
headless name
{{- end }}
- {{ printf "%s-%s.%s.svc.%s" ... | quote }} # PR change is
correct here
- {{ printf "%s-%s" ... | quote }}
```
The wildcard for the **regular** service was indeed useless (DNS for
`x.<regular-service>.<ns>...` doesn't resolve) and your PR's change there is
good — just please keep the wildcard on the headless line.
### CI gap
The current `ci::test_pulsar_producer_consumer` only verifies client →
proxy/standalone connectivity. To catch this regression we'd want a test that
exercises a pod-level FQDN under a headless service with TLS — e.g. a broker →
broker or proxy → broker lookup that ends up using
`<pod>.<release>-broker-headless...`, or a direct TLS connect from the toolset
to a broker pod via its headless FQDN. Could you add something along those
lines?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]