lhotari opened a new pull request #14078:
URL: https://github.com/apache/pulsar/pull/14078


   Fixes #14075
   Fixes #13923
   
   ### Motivation
   
   Pulsar Proxy can get into a state where it stops proxying Broker connections 
while Admin API proxying keeps working.
   The proxy logs are filled with this type of warnings:
   ```
   [pulsar-proxy-io-2-1] WARN  org.apache.pulsar.client.impl.ConnectionPool - 
Failed to open connection to pulsar-dev-broker/172.20.4.120:6650 : 
io.netty.channel.AbstractChannel$AnnotatedConnectException:
   connect(.      .) failed: Cannot assign requested address: 
pulsar-dev-broker.pulsar.svc.cluster.local/172.20.4.120:6650
   ```
   The "Cannot assign requested address" error message is a sign of a port 
exhaustion issue where there are many connections open, possibly hanging.  
   
   
   ### Additional context
   
   One possible reason for the broken hanging connections could be a  race 
condition that shows up in logs this way:
   ```
   [pulsar-proxy-io-2-3] WARN  io.netty.channel.DefaultChannelPipeline - An 
exceptionCaught() event was fired, and it reached at the tail of the pipeline. 
It usually means the last handler in the pipeline did not handle the exception.
   java.lang.UnsupportedOperationException: null
           at 
org.apache.pulsar.common.protocol.PulsarDecoder.handleProducer(PulsarDecoder.java:479)
           at 
org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:193)
           at 
org.apache.pulsar.proxy.server.ProxyConnection.channelRead(ProxyConnection.java:193)
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 [io.netty-netty-transport-4.1.72.Final.jar:4.1.72.Final]
   ```
   This is reported as #13923 and it is fixed as part of the same PR.
   
   ### Modifications
   
   - Optimize the proxy connection to fail-fast if the target broker isn't 
active
     - This reduces the number of hanging connections when unavailable brokers 
aren't unnecessarily attempted to be reached.
     - Pulsar client will retry connecting after a back off timeout
   
   - Fixes the race condition in the Pulsar Proxy when opening a connection 
since that
     could lead to invalid states and hanging connections
   
   - Add connect timeout handling to proxy connection
     - default to 10000 ms which is also the default of client's connect timeout
   
   - Add read timeout handling to incoming connection and proxied connection
     - the ping/pong keepalive messages should prevent the timeout happening,
       however the connection might be in a state where keepalives aren't 
happening.
       - therefore, it's better to have a connection level read timeout to 
prevent broken connections left
         hanging in the proxy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to