lhotari commented on PR #686:
URL: 
https://github.com/apache/pulsar-helm-chart/pull/686#issuecomment-4475171030

   Thanks for the clarification @smbecker. I dug into the templates and I think 
the PR is half-right — it correctly drops `*.` on the regular service line, but 
the wildcard on the **headless** service line is still needed. Some details 
below.
   
   ### Why the wildcard is still needed on the headless line
   
   TLS wildcards (RFC 6125) match only one DNS label, so `*.foo.bar` matches 
`x.foo.bar` but **not** `foo.bar` itself. That's why your test passed once `*.` 
was removed from the regular service line — but the same rule means the 
wildcard on the headless line is covering a different set of names entirely:
   
   - `broker-statefulset.yaml:33` uses `serviceName: {{ 
pulsar.broker.service.headless }}`. StatefulSet pods get DNS records of the 
form `<pod>.<release>-broker-headless.<ns>.svc.<clusterDomain>` (e.g. 
`pulsar-ci-broker-0.pulsar-ci-broker-headless.default.svc.cluster.local`). 
These pod-level FQDNs are what brokers advertise to each other and to 
clients/proxies during lookups, so they need to match the SAN.
   - Same story for zookeeper StatefulSet via `zookeeper-headless-service`.
   - `standalone-deployment.yaml:90-91,158` sets `hostname: standalone` + 
`subdomain: {{ pulsar.standalone.service.headless }}` and runs 
`--advertised-address "$(hostname -f)"`, which expands to 
`standalone.<release>-standalone-headless.<ns>.svc.<clusterDomain>` — again a 
pod-level FQDN under the headless service.
   
   Without `*.<release>-<component>-headless.<ns>.svc.<clusterDomain>` in the 
SAN, TLS hostname verification will fail for any client (or peer broker) that 
uses the advertised/pod FQDN. The local test in this PR connects to the 
**regular** service FQDN, which doesn't exercise this path — that's why it 
looks OK locally.
   
   ### Suggested change
   
   ```yaml
   {{- if or (eq .componentConfig.component "broker") (eq 
.componentConfig.component "zookeeper") }}
   - {{ printf "*.%s-%s-headless.%s.svc.%s" ... | quote }}   # keep — pod FQDNs
   - {{ printf "%s-%s-headless.%s.svc.%s" ... | quote }}     # optional — bare 
headless name
   {{- end }}
   - {{ printf "%s-%s.%s.svc.%s" ... | quote }}              # PR change is 
correct here
   - {{ printf "%s-%s" ... | quote }}
   ```
   
   The wildcard for the **regular** service was indeed useless (DNS for 
`x.<regular-service>.<ns>...` doesn't resolve) and your PR's change there is 
good — just please keep the wildcard on the headless line.
   
   ### CI gap
   
   The current `ci::test_pulsar_producer_consumer` only verifies client → 
proxy/standalone connectivity. To catch this regression we'd want a test that 
exercises a pod-level FQDN under a headless service with TLS — e.g. a broker → 
broker or proxy → broker lookup that ends up using 
`<pod>.<release>-broker-headless...`, or a direct TLS connect from the toolset 
to a broker pod via its headless FQDN. Could you add something along those 
lines?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to