Thanks very much for your quick support, Wu Sheng. I'll try your recommended resource configuration then test again to check whether the problem is resolved.
Thanks & Best Regards Xiaochao Zhang(James) DI SW CAS MP EMK DO-CHN No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China 611731 Cellphone: +86 13980787820 Email: [email protected] -----Original Message----- From: Sheng Wu <[email protected]> Sent: Thursday, January 9, 2020 1:50 PM To: dev <[email protected]> Subject: Re: Java agent buffer setting to avoid trace segment abandon Answer inline. Zhang, James <[email protected]> 于2020年1月9日周四 下午1:35写道: > Hi Wu Sheng, > The storage is ES cluster with 3 nodes in K8S pods(CPU: 2 core, Memory: > 3G) and 100GB SSD disk. > The resources are not enough, especially CPU. At least provide 6-8 core, and 8-10G per node. And please add monitoring plugin for es too. > > I also found a lot "GRPCRemoteClient DEADLINE_EXCEEDED" error log both > in OAP and Java agent logs. > 2020-01-08 05:38:41,512 - > org.apache.skywalking.oap.server.core.remote.client.GRPCRemoteClient > -446965 [grpc-default-executor-2] ERROR [] - DEADLINE_EXCEEDED: > deadline exceeded after 19999971927ns > io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded > after 19999971927ns > This means, the communication between OAP nodes are blocked too, mostly due to the storage performance. Please consider opening SkyWalking Prometheus monitoring. > > Does that mean the OAP service is not powerful enough to receiving > trace date from massive Java agents clients? > From my understanding, both OAP and ES are not enough > > What's the recommend K8S resourced configuration for OAP/ES/UI & Java > service agent (buffer) configuration for PRODUCT env? > There is no specific requirement, most are related to your payload. I could share, the user having 10b traces per day, having 25+ OAP and ES nodes. > > P.S. > Java services and Skywalking are deployed into same K8S cluster. > Therefore there should be no network bottleneck for trace data traffic. > Network would not be the first concern. Notice, SkyWalking is a complex APM system, it is not like Zipkin or Prometheus. So you should expect much more resources used than those two. > > Thanks & Best Regards > > Xiaochao Zhang(James) > DI SW CAS MP EMK DO-CHN > No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China 611731 > Cellphone: +86 13980787820 > Email: [email protected] > > -----Original Message----- > From: Sheng Wu <[email protected]> > Sent: Wednesday, January 8, 2020 5:35 PM > To: dev <[email protected]> > Subject: Re: Java agent buffer setting to avoid trace segment abandon > > What is the storage? > > And clearly, your backend is not powerful enough. 2 core is just for > quick startup. Since you don't give enough resources, SkyWalking will > work on protection mode, trace will be ignored. > > Sheng Wu 吴晟 > Twitter, wusheng1108 > > > Zhang, James <[email protected]> 于2020年1月8日周三 下午5:18写道: > > > Thanks for the quick response, Wu Sheng. > > Env detail: > > The Java services are deployed into K8S pod with 18 instances: CPU 3 > > cores, Memory: 6G The OAP services are deployed into K8S pod with 3 > > instances: CPU 2 cores, > > Memory: 2G > > > > The performance pressure test for Java services is about 6500 cps. > > > > I grep single POD skywaking logs for "abandoned" and 700,000 + > > records were found. > > > > Thanks & Best Regards > > > > Xiaochao Zhang(James) > > DI SW CAS MP EMK DO-CHN > > No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China 611731 > > Cellphone: +86 13980787820 > > Email: [email protected] > > > > -----Original Message----- > > From: Sheng Wu <[email protected]> > > Sent: Wednesday, January 8, 2020 3:49 PM > > To: dev <[email protected]> > > Subject: Re: Java agent buffer setting to avoid trace segment > > abandon > > > > Hi Zhang > > > > Welcome to join the dev ml. How much payload do you put in the tests? > > Does your backend and storage are powerful enough? > > I would prefer you could share more information about your test env. > > > > Sheng Wu 吴晟 > > Twitter, wusheng1108 > > > > > > Zhang, James <[email protected]> 于2020年1月8日周三 下午3:46写道: > > > > > Dear Skywalking Dev, > > > I found a lot of "trace segment has been abandoned, cause by > > > buffer is full" logs in my Java services with Skywalking 6.6.0 agent > > > enabled. > > > DEBUG 2020-01-06 20:43:17:699 http-nio-0.0.0.0-9090-exec-154 > > > TraceSegmentServiceClient : One trace segment has been abandoned, > > > cause by buffer is full. > > > > > > And some "xxx trace segments have been abandoned, cause by no > > > available channel" logs were found also. > > > 2020-01-06 21:37:53:716 DataCarrier.DEFAULT.Consumser.0.Thread > > > TraceSegmentServiceClient : 237 trace segments have been > > > abandoned, cause by no available channel. > > > > > > I checked the source code & documentation found that the default > > > buffer setting is 5(channel_size)*300(buffer_size) and it seems > > > that this default setting is not enough for productive environment > > > of heavy > > load system. > > > > > > To avoid the trace segment abandon, is that OK to just increase > > > the buffer setting (e.g. to 10* 3000) ? How to estimate the > > > memory(in > > > MB) for the buffer setting so that I can evaluate the memory > > > footprint for segments buffer? > > > > > > Thanks & Best Regards > > > > > > Xiaochao Zhang(James) > > > DI SW CAS MP EMK DO-CHN > > > No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China 611731 > > > Email: [email protected] > > > <mailto:[email protected]> > > > > > > > > >
