suiyuzeng opened a new issue #12165:
URL: https://github.com/apache/pulsar/issues/12165
**Describe the bug**
One client has about 500 producers or consumers which belongs to different
bundles. When unloading a bundle, some producers or consumers which do not
belong to the bundle will register again such as:
2021-09-18 16:28:37,000 [ForkJoinPool.commonPool-worker-12] INFO
org.apache.pulsar.broker.service.ServerCnx - [/xx.xx.xx.aa:51064] Created new
producer:
Producer{topic=PersistentTopic{topic=persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091},
client=/xx.xx.xx.aa:51064, producerName=pulsar-cluster-iot-1-24844,
producerId=1091}
The client log is as follows:
2021-09-18 16:28:33,036 [ ERROR ] ClientCnx - [id: 0x08ccf95e,
L:/xx.xx.xx.aa:38706 - R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650] Close
connection because received internal-server error
java.lang.IllegalStateException: Namespace bundle
public/default/0x1c000000_0x20000000 is being unloaded
2021-09-18 16:28:33,490 [ INFO ] ClientCnx - [id: 0x08ccf95e,
L:/xx.xx.xx.aa:38706 ! R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650]
Disconnected
2021-09-18 16:28:34,351 [ INFO ] ConnectionHandler -
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091]
[pulsar-cluster-iot-1-24844] Closed connection [id: 0x08ccf95e,
L:/xx.xx.xx.aa:38706 ! R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650] -- Will
try again in 0.1 s
2021-09-18 16:28:34,651 [ INFO ] ConnectionHandler -
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091]
[pulsar-cluster-iot-1-24844] Reconnecting after timeout
2021-09-18 16:28:36,987 [ INFO ] ProducerImpl -
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091]
[pulsar-cluster-iot-1-24844] Creating producer on cnx [id: 0xb51e7cd6,
L:/xx.xx.xx.aa:51064 - R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650]
2021-09-18 16:28:37,896 [ INFO ] ProducerImpl -
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091]
[pulsar-cluster-iot-1-24844] Created producer on cnx [id: 0xb51e7cd6,
L:/xx.xx.xx.aa:51064 - R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650]
The reason is that the client closes the connection when lookup failed in
org.apache.pulsar.client.impl.ClientCnx#checkServerError.
if (ServerError.ServiceNotReady.equals(error)) {
log.error("{} Close connection because received internal-server
error {}", ctx.channel(), errMsg);
ctx.close();
}
It will trigger the channel inactive.
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
super.channelInactive(ctx);
log.info("{} Disconnected", ctx.channel());
if (!connectionFuture.isDone()) {
connectionFuture.completeExceptionally(new
PulsarClientException("Connection already closed"));
}
ConnectException e = new ConnectException(
"Disconnected from server at " + ctx.channel().remoteAddress());
// Fail out all the pending ops
pendingRequests.forEach((key, future) ->
future.completeExceptionally(e));
waitingLookupRequests.forEach(pair ->
pair.getRight().getRight().completeExceptionally(e));
// Notify all attached producers/consumers so they have a chance to
reconnect
producers.forEach((id, producer) -> producer.connectionClosed(this));
consumers.forEach((id, consumer) -> consumer.connectionClosed(this));
transactionMetaStoreHandlers.forEach((id, handler) ->
handler.connectionClosed(this));
pendingRequests.clear();
waitingLookupRequests.clear();
producers.clear();
consumers.clear();
timeoutTask.cancel(true);
}
In org.apache.pulsar.client.impl.ClientCnx#channelInactive, all the producer
or consumer which use the connection will run the function connectionClosed.
Then it will reconnect and register again. And some message may be sent failed
when failing out all the pending request.
In checkServerError, ServiceNotReady includes some internal-server error,
such as IllegalStateException, lookupResult null and some exception. I think
the connection is well as it can recieve the reponse from the server. And it
will change another connection when reconnectiong
(org.apache.pulsar.client.impl.PulsarServiceNameResolver#resolveHost will
change another server). Is it necessary to close the connection?
Solution in my opinion:
1.Do not close the connection. But in checkServerError, only ServiceNotReady
and TooManyRequests is processed. Is there some issue lead to close the
connection?
2.Add another error code. As ServiceNotReady is used in many other places,
it may lead other problems.
Which one is better? Or other solution?
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]