mini666 commented on PR #8030: URL: https://github.com/apache/hbase/pull/8030#issuecomment-4608173920
@junegunn Sorry for the late reply. To be precise: **this patch isn't deployed to production yet** (it's not merged). What we rolled out to production was a *configuration* workaround — `hbase.client.registry.impl=RpcConnectionRegistry` — which addresses the same KDC-load symptom by a different mechanism: connections then authenticate under the server's shared UGI `Subject`, so the service ticket is cached and reused instead of re-fetched per connection. We don't have Grafana/ES charts for this, but we did capture `tcpdump` on the KDC port (88) before/after the workaround on one production HMaster (~228 tables) during a 5-minute snapshot-batch window: | | Before (`ZKConnectionRegistry`) | After (`RpcConnectionRegistry`) | |---|---|---| | KDC TCP connections | 955 | 67 (**-93%**) | | `AS_REQ` (TGT) | 469 | 19 | | `TGS_REQ` (service ticket) | 486 | 48 | Before the change, a single KDC instance received ~486 requests in that window, which exceeded the KDC's rate limit and got the HMaster's IP blocked — the original symptom. One caveat on attribution: these numbers measure the **workaround**, not this patch directly. They do confirm that the snapshot `validate()` connection setup was the dominant source of KDC traffic — i.e. exactly the connections this patch removes. The difference is that the workaround *caches* tickets while this patch *eliminates* the redundant connections at the root, so this patch still helps deployments that keep the default `ZKConnectionRegistry`. I haven't produced a separate capture isolating only this patch's effect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
