michaeljmarshall opened a new pull request, #19327:
URL: https://github.com/apache/pulsar/pull/19327

   Fixes https://github.com/apache/pulsar/issues/13923
   
   ### Motivation
   
   When the Java client's `ClientCnx` is initialized, there is a race to set 
the target broker and the remote hostname. The target brorker is relevant when 
connecting through the proxy and the remote hostname is relevant for certain 
kinds of authentication, like SASL. In most cases, this race results in the 
preferred order where these values are set before the `ClientCnx#channelActive` 
method is called. However, when that method is called before they are set, 
problems occur.
   
   This PR ensures that the fields are set before we call 
`channel.connect(...)`. 
   
   Additional motivation for understanding the correctness of this solution can 
be found in the relevant netty code:
   
   
https://github.com/netty/netty/blob/cbd324c178135a82f23749bc218c2c6ee3a9b140/transport-classes-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollChannel.java#L648-L659
   
   In that block, notice that the `isActive` gets set to `true`, the promise 
gets completed (which likely triggers callbacks), and then the channel active 
event gets triggered. Interestingly, the easiest way to reproduce the 
problematic behavior in the proxy is with the following steps:
   
   1. Update `createConnection(InetSocketAddress logicalAddress, 
InetSocketAddress physicalAddress, int connectionKey)` so that the callback for 
`createConnection` is `thenAcceptAsync` instead of `thenAccept`.
   2. Run `ProxyTest#testProducer()`.
   3. Observe failure in logs.
   
   ### Modifications
   
   * Update `ConnectionPool` to pass the logical broker address through to the 
initialization methods.
   * Update logic to compare the logical and physical addresses because one 
address is resolved.
   
   ### Verifying this change
   
   It is challenging to create a pure reproducer for this case. I was easily 
able to reproduce it by making the callback complete in another thread. Here is 
the `thenAccept` that I changed to `thenAcceptAsync`:
   
   
https://github.com/apache/pulsar/blob/3c06a4a19617006ba6c9fce6f4409aaf075216a9/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionPool.java#L247
   
   At the very least, the current test coverage will ensure this does not 
introduce a regression.
   
   ### Does this pull request potentially affect one of the following parts:
   
   This is not a breaking change.
   
   ### Documentation
   
   - [x] `doc-not-needed`
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/michaeljmarshall/pulsar/pull/20


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to