lhotari opened a new pull request #14078:
URL: https://github.com/apache/pulsar/pull/14078
Fixes #14075
Fixes #13923
### Motivation
Pulsar Proxy can get into a state where it stops proxying Broker connections
while Admin API proxying keeps working.
The proxy logs are filled with this type of warnings:
```
[pulsar-proxy-io-2-1] WARN org.apache.pulsar.client.impl.ConnectionPool -
Failed to open connection to pulsar-dev-broker/172.20.4.120:6650 :
io.netty.channel.AbstractChannel$AnnotatedConnectException:
connect(. .) failed: Cannot assign requested address:
pulsar-dev-broker.pulsar.svc.cluster.local/172.20.4.120:6650
```
The "Cannot assign requested address" error message is a sign of a port
exhaustion issue where there are many connections open, possibly hanging.
### Additional context
One possible reason for the broken hanging connections could be a race
condition that shows up in logs this way:
```
[pulsar-proxy-io-2-3] WARN io.netty.channel.DefaultChannelPipeline - An
exceptionCaught() event was fired, and it reached at the tail of the pipeline.
It usually means the last handler in the pipeline did not handle the exception.
java.lang.UnsupportedOperationException: null
at
org.apache.pulsar.common.protocol.PulsarDecoder.handleProducer(PulsarDecoder.java:479)
at
org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:193)
at
org.apache.pulsar.proxy.server.ProxyConnection.channelRead(ProxyConnection.java:193)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
[io.netty-netty-transport-4.1.72.Final.jar:4.1.72.Final]
```
This is reported as #13923 and it is fixed as part of the same PR.
### Modifications
- Optimize the proxy connection to fail-fast if the target broker isn't
active
- This reduces the number of hanging connections when unavailable brokers
aren't unnecessarily attempted to be reached.
- Pulsar client will retry connecting after a back off timeout
- Fixes the race condition in the Pulsar Proxy when opening a connection
since that
could lead to invalid states and hanging connections
- Add connect timeout handling to proxy connection
- default to 10000 ms which is also the default of client's connect timeout
- Add read timeout handling to incoming connection and proxied connection
- the ping/pong keepalive messages should prevent the timeout happening,
however the connection might be in a state where keepalives aren't
happening.
- therefore, it's better to have a connection level read timeout to
prevent broken connections left
hanging in the proxy
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]