suiyuzeng opened a new issue #12165:
URL: https://github.com/apache/pulsar/issues/12165


   **Describe the bug**
   One client has about 500 producers or consumers which belongs to different 
bundles. When unloading a bundle, some producers or consumers which do not 
belong to the bundle will register again such as:
   2021-09-18 16:28:37,000 [ForkJoinPool.commonPool-worker-12] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/xx.xx.xx.aa:51064] Created new 
producer: 
Producer{topic=PersistentTopic{topic=persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091},
 client=/xx.xx.xx.aa:51064, producerName=pulsar-cluster-iot-1-24844, 
producerId=1091}
   
   
   The client log is as follows:
   2021-09-18 16:28:33,036 [ ERROR ] ClientCnx - [id: 0x08ccf95e, 
L:/xx.xx.xx.aa:38706 - R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650] Close 
connection because received internal-server error 
java.lang.IllegalStateException: Namespace bundle 
public/default/0x1c000000_0x20000000 is being unloaded
   2021-09-18 16:28:33,490 [ INFO ] ClientCnx - [id: 0x08ccf95e, 
L:/xx.xx.xx.aa:38706 ! R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650] 
Disconnected
   2021-09-18 16:28:34,351 [ INFO ] ConnectionHandler - 
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091] 
[pulsar-cluster-iot-1-24844] Closed connection [id: 0x08ccf95e, 
L:/xx.xx.xx.aa:38706 ! R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650] -- Will 
try again in 0.1 s
   2021-09-18 16:28:34,651 [ INFO ] ConnectionHandler - 
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091] 
[pulsar-cluster-iot-1-24844] Reconnecting after timeout
   2021-09-18 16:28:36,987 [ INFO ] ProducerImpl - 
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091] 
[pulsar-cluster-iot-1-24844] Creating producer on cnx [id: 0xb51e7cd6, 
L:/xx.xx.xx.aa:51064 - R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650]
   2021-09-18 16:28:37,896 [ INFO ] ProducerImpl - 
[persistent://public/default/iot-pulsar-press-abcdefghijklABCDEFGHIJKLM-227091] 
[pulsar-cluster-iot-1-24844] Created producer on cnx [id: 0xb51e7cd6, 
L:/xx.xx.xx.aa:51064 - R:pulsar-broker-test.docker.ys/xx.xx.xx.bb:6650]
   
   The reason is that the client closes the connection when lookup failed in 
org.apache.pulsar.client.impl.ClientCnx#checkServerError. 
   
   if (ServerError.ServiceNotReady.equals(error)) {
           log.error("{} Close connection because received internal-server 
error {}", ctx.channel(), errMsg);
           ctx.close();
   }
   
   It will trigger the channel inactive. 
   
   public void channelInactive(ChannelHandlerContext ctx) throws Exception {
       super.channelInactive(ctx);
       log.info("{} Disconnected", ctx.channel());
       if (!connectionFuture.isDone()) {
           connectionFuture.completeExceptionally(new 
PulsarClientException("Connection already closed"));
       }
   
       ConnectException e = new ConnectException(
               "Disconnected from server at " + ctx.channel().remoteAddress());
   
       // Fail out all the pending ops
       pendingRequests.forEach((key, future) -> 
future.completeExceptionally(e));
       waitingLookupRequests.forEach(pair -> 
pair.getRight().getRight().completeExceptionally(e));
   
       // Notify all attached producers/consumers so they have a chance to 
reconnect
       producers.forEach((id, producer) -> producer.connectionClosed(this));
       consumers.forEach((id, consumer) -> consumer.connectionClosed(this));
       transactionMetaStoreHandlers.forEach((id, handler) -> 
handler.connectionClosed(this));
   
       pendingRequests.clear();
       waitingLookupRequests.clear();
   
       producers.clear();
       consumers.clear();
   
       timeoutTask.cancel(true);
   }
   
   In org.apache.pulsar.client.impl.ClientCnx#channelInactive, all the producer 
or consumer which use the connection will run the function connectionClosed. 
Then it will reconnect and register again. And some message may be sent failed 
when failing out all the pending request.
   
   In checkServerError, ServiceNotReady includes some internal-server error, 
such as IllegalStateException, lookupResult null and some exception. I think 
the connection is well as it can recieve the reponse from the server. And it 
will change another connection when reconnectiong 
(org.apache.pulsar.client.impl.PulsarServiceNameResolver#resolveHost will 
change another server). Is it necessary to close the connection?
   
   Solution in my opinion:
   1.Do not close the connection. But in checkServerError, only ServiceNotReady 
and TooManyRequests is processed. Is there some issue lead to close the 
connection?
   2.Add another error code. As ServiceNotReady is used in many other places, 
it may lead other problems.
   
   Which one is better? Or other solution?
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to