rdhabalia opened a new pull request, #23974:
URL: https://github.com/apache/pulsar/pull/23974
### Motivation
In Apache Pulsar, the broker enables producers and consumers to connect to a
topic and provides an API to retrieve topic statistics. These stats include a
list of connected producers and consumers, along with their IP addresses and
connection times. This information is particularly valuable when dealing with a
large number of producers and consumers from various client hosts, as it helps
troubleshoot issues such as:
Identifying which client host has an active consumer
Detecting if a client host has stopped consuming messages
Diagnosing message backlogs
Thus, mapping the client host IP to the corresponding producer or consumer
is crucial.
**The Issue with Reverse Proxies**
However, this mapping breaks when a reverse proxy is used between the client
and broker. In such cases, the broker records only the proxy's IP address for
all connected producers and consumers, making it difficult to identify the
actual client host. Apache Pulsar supports multiple proxy solutions, such as
Pulsar-Proxy and SNI Proxy, which further complicates troubleshooting by
obscuring client IPs.
To resolve this, this PR ensures that when a client library connects to a
broker via a proxy, it sends the actual client IP address. The broker then
correctly identifies and records this IP in the stats API, mapping it to the
appropriate producer or consumer. This approach abstracts the proxy layer from
users, allowing them to see accurate client IPs without any additional effort.
This PR doesn't change client-broker protocol, API definition or
configuration.
### Modifications
Client lib sends an ip-address property when client lib detects a proxy, and
the broker shows it in the client stats.
### Verifying this change
- [ ] Make sure that the change passes the CI checks.
*(Please pick either of the following options)*
This change is a trivial rework / code cleanup without any test coverage.
*(or)*
This change is already covered by existing tests, such as *(please describe
tests)*.
*(or)*
This change added tests and can be verified as follows:
*(example:)*
- *Added integration tests for end-to-end deployment with large payloads
(10MB)*
- *Extended integration test for recovery after broker failure*
### Does this pull request potentially affect one of the following parts:
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
*If the box was checked, please highlight the changes*
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
### Documentation
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update
later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
### Matching PR in forked repository
PR in forked repository: <!-- ENTER URL HERE -->
<!--
After opening this PR, the build in apache/pulsar will fail and instructions
will
be provided for opening a PR in the PR author's forked repository.
apache/pulsar pull requests should be first tested in your own fork since
the
apache/pulsar CI based on GitHub Actions has constrained resources and quota.
GitHub Actions provides separate quota for pull requests that are executed
in
a forked repository.
The tests will be run in the forked repository until all PR review comments
have
been handled, the tests pass and the PR is approved by a reviewer.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]