hepyu opened a new issue, #16327:
URL: https://github.com/apache/pulsar/issues/16327

   **Directory**
   **(1).pulsar cluster info**
   **(2).accident site statement**
   **(3).I tracked**
   **(4). I guess**
   
   **(1).pulsar cluster info**
   
   Pulsar cluster is deployed in aws ec2, six ec2 totally.
   
   3 ec2: ec2 is 4c8g, only deployed broker per ec2.
   3 ec2: ec2 is 8c16g, deployed 1 broker and 1 bookie per ec2.
   
   pulsar version: 2.8.0
   
   **(2).accident site statement**
   
   One broker of cluster occur bundle unload error exception stack info at 
00:12:31 as this:
   
   ```
   00:12:31.855 [AsyncHttpClient-48-1] WARN  
org.apache.pulsar.client.admin.internal.BaseResource - 
[http://10.0.44.174:8080/admin/v2/namespaces/xxx/public/0xb0000000_0xc0000000/unload]
 Failed to perform http put request: javax.ws.rs.InternalServerErrorException: 
HTTP 500 Internal Server Error
   00:12:31.858 [pulsar-load-manager-1-1] WARN  
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Error when 
trying to perform load shedding on xxx/public/0xb0000000_0xc0000000 for broker 
10.0.37.216:8080
   
org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: 
HTTP 500 Internal Server Error
           at 
org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:219)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.8.0.jar:2.8.0]
           at 
org.apache.pulsar.client.admin.internal.BaseResource$1.failed(BaseResource.java:130)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.8.0.jar:2.8.0]
           at 
org.glassfish.jersey.client.JerseyInvocation$1.failed(JerseyInvocation.java:882)
 ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at 
org.glassfish.jersey.client.JerseyInvocation$1.completed(JerseyInvocation.java:863)
 ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at 
org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:229)
 ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at 
org.glassfish.jersey.client.ClientRuntime.access$200(ClientRuntime.java:62) 
~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at 
org.glassfish.jersey.client.ClientRuntime$2.lambda$response$0(ClientRuntime.java:173)
 ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) 
~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
           at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) 
~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
           at org.glassfish.jersey.internal.Errors.process(Errors.java:292) 
~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
           at org.glassfish.jersey.internal.Errors.process(Errors.java:274) 
~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
           at org.glassfish.jersey.internal.Errors.process(Errors.java:244) 
~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
           at 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:288)
 ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
           at 
org.glassfish.jersey.client.ClientRuntime$2.response(ClientRuntime.java:173) 
~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$apply$1(AsyncHttpConnector.java:212)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.8.0.jar:2.8.0]
           at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
 ~[?:?]
           at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 ~[?:?]
           at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) 
~[?:?]
           at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) 
~[?:?]
           at 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$4(AsyncHttpConnector.java:254)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.8.0.jar:2.8.0]
           at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
 ~[?:?]
           at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 ~[?:?]
           at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) 
~[?:?]
           at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) 
~[?:?]
           at 
org.asynchttpclient.netty.NettyResponseFuture.loadContent(NettyResponseFuture.java:222)
 ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
           at 
org.asynchttpclient.netty.NettyResponseFuture.done(NettyResponseFuture.java:257)
 ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
           at 
org.asynchttpclient.netty.handler.AsyncHttpClientHandler.finishUpdate(AsyncHttpClientHandler.java:241)
 ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
           at 
org.asynchttpclient.netty.handler.HttpHandler.handleChunk(HttpHandler.java:114) 
~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
           at 
org.asynchttpclient.netty.handler.HttpHandler.handleRead(HttpHandler.java:143) 
~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
           at 
org.asynchttpclient.netty.handler.AsyncHttpClientHandler.channelRead(AsyncHttpClientHandler.java:78)
 ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 ~[io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
 ~[io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
 ~[io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
 ~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) 
~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) 
~[io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
 ~[io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
~[io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal 
Server Error
           at 
org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:960)
 ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           at 
org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:82)
 ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
           ... 54 more
   ```
   
   And then, another broker occur error into "error Namespace bundle 
xxx/public/0xb0000000_0xc0000000 is being unloaded", its would be continuous 
printing, never stop.
   
   ```
   00:12:31.837 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] ERROR 
org.apache.pulsar.broker.admin.impl.NamespacesBase - [null] Failed to unload 
namespace bundle xxx/public/0xb0000000_0xc0000000
   00:12:31.915 [pulsar-io-4-16] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx-4 with error Namespace bundle 
xxx/public/0xb0000000_0xc0000000 is being unloaded
   00:12:31.916 [pulsar-io-4-16] WARN  
org.apache.pulsar.broker.service.ServerCnx - [/10.0.39.109:57794] Got exception 
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer
   00:12:31.917 [pulsar-io-4-15] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx-partition-2 with error Namespace bundle 
xxx/public/0xb0000000_0xc0000000 is being unloaded
   00:12:31.917 [pulsar-io-4-16] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx_1001044 with error Namespace bundle 
xxx/public/0xb0000000_0xc0000000 is being unloaded
   00:12:31.917 [pulsar-io-4-15] WARN  
org.apache.pulsar.broker.service.ServerCnx - [/10.0.32.49:34574] Got exception 
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer
   00:12:31.920 [pulsar-io-4-7] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx-4 with error Namespace bundle 
xxx/public/0xb0000000_0xc0000000 is being unloaded
   00:12:31.920 [pulsar-io-4-4] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx-partition-4 with error Namespace bundle 
xxx/public/0xb0000000_0xc0000000 is being unloaded
   00:12:31.921 [pulsar-io-4-9] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx_message-partition-2 with error Namespace 
bundle xxx/public/0xb0000000_0xc0000000 is being unloaded
   00:12:31.922 [pulsar-io-4-16] INFO  
org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for 
topic persistent://xxx/public/xxx_message-partition-2 with error Namespace 
bundle xxx/public/0xb0000000_0xc0000000 is being unloaded
   ```
   
   And then 3 topics backlog occur massive accumulation!
   
   **(3).I tracked**
   
   Trigger point that bundle unload failed:
   <img width="1315" alt="image" 
src="https://user-images.githubusercontent.com/3917749/176862879-ae200fbe-7dca-4c64-b463-b55816f48942.png";>
   
   Result point that namespace bundle xxx is being unloaded:
   <img width="1169" alt="image" 
src="https://user-images.githubusercontent.com/3917749/176863064-4ac9572d-f51c-433b-a479-d7bf65e548c1.png";>
   
   **(4). I guess**
   
   Why occur trigger point, I don't know the exact cause, code is so large, 
dizzy.
   
   But I can guess something according to historical experiences, Topic 
accumulation has happened occasionally before, I think the broker is seriously 
short of resources and that's fact, so I double my resources for brokers, and 
never occur topic accumulation until this morning.
   
   I think this phenomenon can be avoided as much as possible by greatly 
increasing broker resources, because there are enough resources, unload will 
not occur. But this problem cannot be completely eliminated, because the 
production environment is complex,for example, the severe network fluctuation 
of AWS, or the sudden isolation of EC2 where a broker node is located, these 
emergencies may still trigger this problem.
   
   So, this bug shoud be solved still, otherwise stability is a big problem.
   
   
   
   
   
   
   
   
   
   
   
   
    
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to