xuet0ng opened a new issue #2190: [Agent] can not connect to the collector maybe cause memory leak URL: https://github.com/apache/incubator-skywalking/issues/2190 Please answer these questions before submitting your issue. - Why do you submit this issue? - [ ] Question or discussion - [x] Bug - [ ] Requirement - [ ] Feature or performance improvement ___ ### Bug - Which version of SkyWalking, OS and JRE? - version: remotes/origin/v5.0.0-KXLGA - OS: centos 6.5 - JRE: java version "1.8.0_144" - Which company or project? - What happen? If possible, provide a way for reproducing the error. e.g. demo application, component version. - background - java, spring cloud, micro-service - what happened - several services unavailable on different machine and almost in same time. - java service log says ``` java.lang.OutOfMemoryError: GC overhead limit exceeded ``` - skywalking log says ``` 00:02:21.725 [SkywalkingAgent-2-GRPCChannelManager-0] ERROR o.a.s.a.d.i.g.i.ManagedChannelOrphanWrapper - *~*~*~ Channel ManagedChannelImpl{logId=1042418, target=172.21.16.175:11800} was not shutdown properly!!! ~*~*~* Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true. java.lang.RuntimeException: ManagedChannel allocation site at org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:103) at org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:53) at org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:44) at org.apache.skywalking.apm.dependencies.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410) at org.apache.skywalking.apm.agent.core.remote.GRPCChannel.<init>(GRPCChannel.java:46) at org.apache.skywalking.apm.agent.core.remote.GRPCChannel.<init>(GRPCChannel.java:31) at org.apache.skywalking.apm.agent.core.remote.GRPCChannel$Builder.build(GRPCChannel.java:95) at org.apache.skywalking.apm.agent.core.remote.GRPCChannelManager.run(GRPCChannelManager.java:99) at org.apache.skywalking.apm.util.RunnableWithExceptionProtection.run(RunnableWithExceptionProtection.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 00:03:21.724 [SkywalkingAgent-2-GRPCChannelManager-0] ERROR o.a.s.a.d.i.g.i.ManagedChannelOrphanWrapper - *~*~*~ Channel ManagedChannelImpl{logId=1042467, target=172.21.16.175:11800} was not shutdown properly!!! ~*~*~* Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true. ``` - java abnormal objects use 'jmap -histo:live {pid} > ~/histolive' ``` num #instances #bytes class name ---------------------------------------------- [xxx] [xxx@xxxxxxxxxxxx ~]$ grep '39013' histolive 4: 39013 20598600 [Lorg.apache.skywalking.apm.dependencies.io.netty.handler.codec.http2.HpackHeaderField; 11: 39013 8426808 org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelImpl 30: 39013 3745248 org.apache.skywalking.apm.dependencies.io.grpc.internal.TransportTracer 40: 39013 2808936 org.apache.skywalking.apm.dependencies.io.netty.channel.DefaultChannelHandlerContext 47: 39013 2496832 org.apache.skywalking.apm.dependencies.io.grpc.internal.DelayedClientTransport 48: 39013 2496832 org.apache.skywalking.apm.dependencies.io.grpc.netty.NettyChannelBuilder$NettyTransportFactory 57: 39013 1872624 org.apache.skywalking.apm.dependencies.io.grpc.internal.Rescheduler 94: 39013 1248416 org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusStatsModule 95: 39013 1248416 org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusTracingModule 96: 39013 1248416 org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelImpl$1 97: 39013 1248416 org.apache.skywalking.apm.dependencies.io.grpc.internal.ManagedChannelImpl$UncommittedRetriableStreamsRegistry 98: 39013 1248416 org.apache.skywalking.apm.dependencies.io.grpc.internal.ServiceConfigInterceptor 99: 39013 1248416 org.apache.skywalking.apm.dependencies.io.netty.channel.AdaptiveRecvByteBufAllocator 128: 39013 936312 org.apache.skywalking.apm.dependencies.io.grpc.internal.AtomicBackoff 129: 39013 936312 org.apache.skywalking.apm.dependencies.io.grpc.internal.CallCredentialsApplyingTransportFactory 130: 39013 936312 org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusStatsModule$1 131: 39013 936312 org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusStatsModule$StatsClientInterceptor 132: 39013 936312 org.apache.skywalking.apm.dependencies.io.grpc.internal.CensusTracingModule$1 ``` - skywalking agent code fragment ``` package org.apache.skywalking.apm.agent.core.remote; public class GRPCChannelManager ... { ... @Override public void run() { logger.debug("Selected collector grpc service running, reconnect:{}.", reconnect); if (reconnect) { if (RemoteDownstreamConfig.Collector.GRPC_SERVERS.size() > 0) { String server = ""; try { int index = Math.abs(random.nextInt()) % RemoteDownstreamConfig.Collector.GRPC_SERVERS.size(); server = RemoteDownstreamConfig.Collector.GRPC_SERVERS.get(index); String[] ipAndPort = server.split(":"); managedChannel = GRPCChannel.newBuilder(ipAndPort[0], Integer.parseInt(ipAndPort[1])) .addManagedChannelBuilder(new StandardChannelBuilder()) .addManagedChannelBuilder(new TLSChannelBuilder()) .addChannelDecorator(new AuthenticationDecorator()) .build(); if (!managedChannel.isShutdown() && !managedChannel.isTerminated()) { reconnect = false; notify(GRPCChannelStatus.CONNECTED); } else { notify(GRPCChannelStatus.DISCONNECT); } return; } catch (Throwable t) { logger.error(t, "Create channel to {} fail.", server); notify(GRPCChannelStatus.DISCONNECT); } } logger.debug("Selected collector grpc service is not available. Wait {} seconds to retry", Config.Collector.GRPC_CHANNEL_CHECK_INTERVAL); } } ... } ``` - if grpc original channel can not connect the collector...yes, the collector of this env being dead more than 10 days... :( 1. original channel will not shutdown 2. a new original channel will be created 3. memory leak 4. until gc can not reclaim memory, out of memory
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
