xuet0ng opened a new issue #2190: [Agent] can not connect to the collector 
maybe cause memory leak
URL: https://github.com/apache/incubator-skywalking/issues/2190
 
 
   Please answer these questions before submitting your issue.
   
   - Why do you submit this issue?
   - [ ] Question or discussion
   - [x] Bug
   - [ ] Requirement
   - [ ] Feature or performance improvement
   
   ___
   ### Bug
   - Which version of SkyWalking, OS and JRE?
     - version: remotes/origin/v5.0.0-KXLGA
     - OS: centos 6.5
     - JRE: java version "1.8.0_144"
   - Which company or project?
   
   - What happen?
   If possible, provide a way for reproducing the error. e.g. demo application, 
component version.
     - background
       - java, spring cloud, micro-service
     - what happened
       - several services unavailable on different machine and almost in same 
time.
       - java service log says
       ```
       java.lang.OutOfMemoryError: GC overhead limit exceeded
       ```
       - skywalking log says
       ```
       00:02:21.725 [SkywalkingAgent-2-GRPCChannelManager-0] ERROR 
o.a.s.a.d.i.g.i.ManagedChannelOrphanWrapper - *~*~*~ Channel 
ManagedChannelImpl{logId=1042418, target=172.21.16.175:11800} was not shutdown 
properly!!! ~*~*~*
           Make sure to call shutdown()/shutdownNow() and wait until 
awaitTermination() returns true.
       java.lang.RuntimeException: ManagedChannel allocation site
               at 
org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:103)
               at 
org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:53)
               at 
org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:44)
               at 
org.apache.skywalking.apm.dependencies.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410)
               at 
org.apache.skywalking.apm.agent.core.remote.GRPCChannel.<init>(GRPCChannel.java:46)
               at 
org.apache.skywalking.apm.agent.core.remote.GRPCChannel.<init>(GRPCChannel.java:31)
               at 
org.apache.skywalking.apm.agent.core.remote.GRPCChannel$Builder.build(GRPCChannel.java:95)
               at 
org.apache.skywalking.apm.agent.core.remote.GRPCChannelManager.run(GRPCChannelManager.java:99)
               at 
org.apache.skywalking.apm.util.RunnableWithExceptionProtection.run(RunnableWithExceptionProtection.java:36)
               at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
               at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
               at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
               at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
               at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
       00:03:21.724 [SkywalkingAgent-2-GRPCChannelManager-0] ERROR 
o.a.s.a.d.i.g.i.ManagedChannelOrphanWrapper - *~*~*~ Channel 
ManagedChannelImpl{logId=1042467, target=172.21.16.175:11800} was not shutdown 
properly!!! ~*~*~*
           Make sure to call shutdown()/shutdownNow() and wait until 
awaitTermination() returns true.
       ```
       - java abnormal objects use 'jmap -histo:live {pid} > ~/histolive'
       ```
         num     #instances         #bytes  class name
        ----------------------------------------------
        [xxx] [xxx@xxxxxxxxxxxx ~]$ grep '39013' histolive 
        4:         39013       20598600  
[Lorg.apache.skywalking.apm.dependencies.io.netty.handler.codec.http2.HpackHeaderField;
        11:         39013        8426808  
org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelImpl
        30:         39013        3745248  
org.apache.skywalking.apm.dependencies.io.grpc.internal.TransportTracer
        40:         39013        2808936  
org.apache.skywalking.apm.dependencies.io.netty.channel.DefaultChannelHandlerContext
        47:         39013        2496832  
org.apache.skywalking.apm.dependencies.io.grpc.internal.DelayedClientTransport
        48:         39013        2496832  
org.apache.skywalking.apm.dependencies.io.grpc.netty.NettyChannelBuilder$NettyTransportFactory
        57:         39013        1872624  
org.apache.skywalking.apm.dependencies.io.grpc.internal.Rescheduler
        94:         39013        1248416  
org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusStatsModule
        95:         39013        1248416  
org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusTracingModule
        96:         39013        1248416  
org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelImpl$1
        97:         39013        1248416  
org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelImpl$UncommittedRetriableStreamsRegistry
        98:         39013        1248416  
org.apache.skywalking.apm.dependencies.io.grpc.internal.ServiceConfigInterceptor
        99:         39013        1248416  
org.apache.skywalking.apm.dependencies.io.netty.channel.AdaptiveRecvByteBufAllocator
        128:         39013         936312  
org.apache.skywalking.apm.dependencies.io.grpc.internal.AtomicBackoff
        129:         39013         936312  
org.apache.skywalking.apm.dependencies.io.grpc.internal.CallCredentialsApplyingTransportFactory
        130:         39013         936312  
org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusStatsModule$1
        131:         39013         936312  
org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusStatsModule$StatsClientInterceptor
        132:         39013         936312  
org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusTracingModule$1
       ```
       - skywalking agent code fragment
       ```
       package org.apache.skywalking.apm.agent.core.remote;
       public class GRPCChannelManager ... {
         ...
         @Override
          public void run() {
              logger.debug("Selected collector grpc service running, 
reconnect:{}.", reconnect);
              if (reconnect) {
                  if (RemoteDownstreamConfig.Collector.GRPC_SERVERS.size() > 0) 
{
                      String server = "";
                      try {
                          int index = Math.abs(random.nextInt()) % 
RemoteDownstreamConfig.Collector.GRPC_SERVERS.size();
                          server = 
RemoteDownstreamConfig.Collector.GRPC_SERVERS.get(index);
                          String[] ipAndPort = server.split(":");
   
                          managedChannel = GRPCChannel.newBuilder(ipAndPort[0], 
Integer.parseInt(ipAndPort[1]))
                              .addManagedChannelBuilder(new 
StandardChannelBuilder())
                              .addManagedChannelBuilder(new TLSChannelBuilder())
                              .addChannelDecorator(new 
AuthenticationDecorator())
                              .build();
   
                          if (!managedChannel.isShutdown() && 
!managedChannel.isTerminated()) {
                              reconnect = false;
                              notify(GRPCChannelStatus.CONNECTED);
                          } else {
                              notify(GRPCChannelStatus.DISCONNECT);
                          }
                          return;
                      } catch (Throwable t) {
                          logger.error(t, "Create channel to {} fail.", server);
                          notify(GRPCChannelStatus.DISCONNECT);
                      }
                  }
   
                  logger.debug("Selected collector grpc service is not 
available. Wait {} seconds to retry", 
Config.Collector.GRPC_CHANNEL_CHECK_INTERVAL);
              }
          }
         ...
       }
       ```
       - if grpc original channel can not connect the collector...yes, the 
collector of this env being dead more than 10 days... :(
         1. original channel will not shutdown
         2. a new original channel will be created
         3. memory leak
         4. until gc can not reclaim memory, out of memory
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to