nisiyong opened a new issue #6703:
URL: https://github.com/apache/skywalking/issues/6703
Please answer these questions before submitting your issue.
- Why do you submit this issue?
- [ ] Question or discussion
- [ ] Bug
- [ ] Requirement
- [x] Feature or performance improvement
___
### Requirement or improvement
SkyWalking Java Agent is a powerful language instrument, it makes us build
our tracing system more easily.
We have used SkyWalking with our Java Applications in production serval
mouths, it runs fine mostly. Recently, we found some applications occur with
frequent GC and some occur OOM. We dump the memory heap and use [Memory
Analyzer (MAT)](https://www.eclipse.org/mat/) find there has a lot of
`TraceSegmentRef` Object in the heap. Here are two cases as follows:
#### Case 1: Frequency GC
In this case, the app has 1000 Dubbo handler threads, each handler will do a
lot RPCs and DB operations.
- JVM Max Heap: 8g
- Machine: 8 core 16g
- SkyWalking Agent: 8.4.0, collect all traces


#### Case 2: OOM
In this case, the app has 20 RocketMQ consume threads, in the consume
thread, it will do some RPCs and DB operations.
- JVM Max Heap: 8g
- Machine: 8 core 16g
- SkyWalking Agent: 8.4.0, collect all traces


---
On the application side, I think there have 3 reasons:
1. sudden high throughput will cause all threads busy to handle requests.
2. each request handle has a lot of RPCs and DB operations, cause create a
lot of spans
3. Handle requests slowly, some will elapse 10s even more.
On the agent side, I have read the source code and know some design:
- The `Segment` in the SkyWalking concept, is the Object in the RingBuffer
on the client-side, and SkyWalking has a consume thread consume the RingBuffer
data send to the OAP.
- Before put the `Segment` Object in the RingBuffer, will build it first.
Each request will create some spans, and there are put in the stack data
structure, the `Segment` will finish building utils the stack empty, which
means the request in the application has finished. It will take some time.
Meanwhile, the data will keep in the thread-local. And the garbage collector
cannot collect them before the request finished.
I wonder why put the segment in the ring buffer, could we put the span? I
don't familiar with the Segment design purpose.
And I know we should improve our application at the same time, but in some
scenarios, people can tolerate it, even though handling requests slowly. So how
SkyWalking Java Agent can do in such extreme scenarios? Because the application
availability is very important, all of us won't hope the APM instrument
occupies a lot of memory.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]