Thanks very much for your quick support, Wu Sheng.

I'll try your recommended resource configuration then test again to check 
whether the problem is resolved.

Thanks & Best Regards

Xiaochao Zhang(James)
DI SW CAS MP EMK DO-CHN
No.7, Xixin Avenue, Chengdu High-Tech Zone
Chengdu, China  611731
Cellphone: +86 13980787820
Email: [email protected] 

-----Original Message-----
From: Sheng Wu <[email protected]> 
Sent: Thursday, January 9, 2020 1:50 PM
To: dev <[email protected]>
Subject: Re: Java agent buffer setting to avoid trace segment abandon

Answer inline.


Zhang, James <[email protected]> 于2020年1月9日周四 下午1:35写道:

> Hi Wu Sheng,
> The storage is ES cluster with 3 nodes in K8S pods(CPU: 2 core, Memory:
> 3G) and 100GB SSD disk.
>

The resources are not enough, especially CPU. At least provide 6-8 core, and 
8-10G per node. And please add monitoring plugin for es too.


>
> I also found a lot "GRPCRemoteClient DEADLINE_EXCEEDED" error log both 
> in OAP and Java agent logs.
> 2020-01-08 05:38:41,512 -
> org.apache.skywalking.oap.server.core.remote.client.GRPCRemoteClient
> -446965 [grpc-default-executor-2] ERROR [] - DEADLINE_EXCEEDED: 
> deadline exceeded after 19999971927ns
> io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded 
> after 19999971927ns
>

This means, the communication between OAP nodes are blocked too, mostly due to 
the storage performance.
Please consider opening SkyWalking Prometheus monitoring.



>
> Does that mean the OAP service is not powerful enough to receiving 
> trace date from massive Java agents clients?
>

From my understanding, both OAP and ES are not enough



>
> What's the recommend K8S resourced configuration for OAP/ES/UI & Java 
> service agent (buffer) configuration for PRODUCT env?
>

There is no specific requirement, most are related to your payload. I could 
share, the user having 10b traces per day, having 25+ OAP and ES nodes.



>
> P.S.
> Java services and Skywalking are deployed into same K8S cluster. 
> Therefore there should be no network bottleneck for trace data traffic.
>

Network would not be the first concern.

Notice, SkyWalking is a complex APM system, it is not like Zipkin or 
Prometheus. So you should expect much more resources used than those two.



>
> Thanks & Best Regards
>
> Xiaochao Zhang(James)
> DI SW CAS MP EMK DO-CHN
> No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China  611731
> Cellphone: +86 13980787820
> Email: [email protected]
>
> -----Original Message-----
> From: Sheng Wu <[email protected]>
> Sent: Wednesday, January 8, 2020 5:35 PM
> To: dev <[email protected]>
> Subject: Re: Java agent buffer setting to avoid trace segment abandon
>
> What is the storage?
>
> And clearly, your backend is not powerful enough. 2 core is just for 
> quick startup. Since you don't give enough resources, SkyWalking will 
> work on protection mode, trace will be ignored.
>
> Sheng Wu 吴晟
> Twitter, wusheng1108
>
>
> Zhang, James <[email protected]> 于2020年1月8日周三 下午5:18写道:
>
> > Thanks for the quick response, Wu Sheng.
> > Env detail:
> > The Java services are deployed into K8S pod with 18 instances: CPU 3 
> > cores, Memory: 6G The OAP services are deployed into K8S pod with 3
> > instances: CPU 2 cores,
> > Memory: 2G
> >
> > The performance pressure test for Java services is about 6500 cps.
> >
> > I grep single POD skywaking logs for "abandoned" and 700,000 + 
> > records were found.
> >
> > Thanks & Best Regards
> >
> > Xiaochao Zhang(James)
> > DI SW CAS MP EMK DO-CHN
> > No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China  611731
> > Cellphone: +86 13980787820
> > Email: [email protected]
> >
> > -----Original Message-----
> > From: Sheng Wu <[email protected]>
> > Sent: Wednesday, January 8, 2020 3:49 PM
> > To: dev <[email protected]>
> > Subject: Re: Java agent buffer setting to avoid trace segment 
> > abandon
> >
> > Hi Zhang
> >
> > Welcome to join the dev ml. How much payload do you put in the tests?
> > Does your backend and storage are powerful enough?
> > I would prefer you could share more information about your test env.
> >
> > Sheng Wu 吴晟
> > Twitter, wusheng1108
> >
> >
> > Zhang, James <[email protected]> 于2020年1月8日周三 下午3:46写道:
> >
> > > Dear Skywalking Dev,
> > > I found a lot of "trace segment has been abandoned, cause by 
> > > buffer is full" logs in my Java services with Skywalking 6.6.0 agent 
> > > enabled.
> > > DEBUG 2020-01-06 20:43:17:699 http-nio-0.0.0.0-9090-exec-154 
> > > TraceSegmentServiceClient : One trace segment has been abandoned, 
> > > cause by buffer is full.
> > >
> > > And some "xxx trace segments have been abandoned, cause by no 
> > > available channel" logs were found  also.
> > > 2020-01-06 21:37:53:716 DataCarrier.DEFAULT.Consumser.0.Thread
> > > TraceSegmentServiceClient : 237 trace segments have been 
> > > abandoned, cause by no available channel.
> > >
> > > I checked the source code & documentation found that the default 
> > > buffer setting is 5(channel_size)*300(buffer_size) and it seems 
> > > that this default setting is not enough for productive environment 
> > > of heavy
> > load system.
> > >
> > > To avoid the trace segment abandon, is that OK to just increase 
> > > the buffer setting (e.g. to 10* 3000) ? How to estimate the 
> > > memory(in
> > > MB) for the buffer setting so that I can evaluate the memory 
> > > footprint for segments buffer?
> > >
> > > Thanks & Best Regards
> > >
> > > Xiaochao Zhang(James)
> > > DI SW CAS MP EMK DO-CHN
> > > No.7, Xixin Avenue, Chengdu High-Tech Zone Chengdu, China  611731
> > > Email: [email protected]
> > > <mailto:[email protected]>
> > >
> > >
> >
>

Reply via email to