BewareMyPower opened a new pull request #15009: URL: https://github.com/apache/pulsar/pull/15009
### Motivation Recently we found C++ client might not refresh the OAuth2 token after the reconnection. It's because when Pulsar proxy is enabled, the `AuthResponse` might be sent by proxy, which leads to a disconnection from broker side. See https://github.com/apache/pulsar/blob/94cc46a66fbe322472dbb657803d21320e59079c/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L687 Then, proxy will return a `ServiceNotReady` error to client as the result of topic lookup. However, in this case, C++ client only completes the future of lookup with `ResultConnectError`. The `ClientConnection` object is still cached in the pool and will be used for followed interactions. In addition, the `AuthChallenge` request from broker might never be sent to client when proxy is enabled. So there is no chance for client to refresh the token unless a new connection is established. ### Modifications - Like what Java client does, add `checkServerError` method to `ClientConnection`, which closes the socket for `ServiceNotReady` error. Here we don't call `close` directly just not to interrupt the execution of `handleIncomingCommand`. `close` will be called in `handleRead` because the `err` will be `boost::asio::error::bad_descriptor`. - To ensure `ClientConnection::close` can remove itself from the connection pool, add `remove` method in `ConnectionPool` to remove a connection by address. Then hold a reference to the pool in `ClientConnection`. At the beginning of `close()`, remove the connection from pool. ### Verifying this change It's hard to mock the `ServiceNotReady` case in unit test. I have tested this patch in my private env and let the OAuth2 token expire quickly. We can see the following logs (hosts are hidden): ``` 2022-04-03 13:07:55.616 ERROR [0x700006480000] ClientConnection:1002 | [<local-ip>:62059 -> <remote-ip>:6651] Failed lookup req_id: 136 error: ServiceUnitNotReady msg: writeAddress(..) failed: Broken pipe 2022-04-03 13:07:55.616 ERROR [0x700006480000] ClientImpl:443 | Error getting topic partitions metadata: ServiceUnitNotReady 2022-04-03 13:07:55.617 ERROR [0x700006480000] ClientConnection:599 | [<local-ip>:62059 -> <remote-ip>:6651] Read operation failed: Bad file descriptor 2022-04-03 13:07:55.617 INFO [0x700006480000] ConnectionPool:119 | Removing connection from pool for <proxy-url> use_count: 3 @ 0x7f7d3680ec00 2022-04-03 13:07:55.617 INFO [0x700006480000] ClientConnection:1568 | [<local-ip>:62059 -> <remote-ip>:6651] Connection closed 2022-04-03 13:07:55.617 INFO [0x700006480000] ClientConnection:265 | [<local-ip>:62059 -> <remote-ip>:6651] Destroyed connection 2022-04-03 13:07:56.618 INFO [0x113d8d600] ClientConnection:191 | [<none> -> <proxy-url>] Create ClientConnection, timeout=10000 2022-04-03 13:08:00.841 INFO [0x113d8d600] ConnectionPool:97 | Created connection for <proxy-url> 2022-04-03 13:08:01.348 INFO [0x700006480000] ClientConnection:377 | [<local-ip>:62552 -> <remote-ip>:6651] Connected to broker ``` We can see a new connection (with a new port) was established after handling the `ServiceUnitNotReady` error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
