There was one process which started using memory rapidly at certain point and
grew up to ~27GB of RSS used until eventually we restarted it. Which happened
after a month of running of 10 ambari-agent nodes.
(docker)[root@hcube2-1n01 ~]# ps aux | grep ambari_agent^M
root 39955 0.0 0.0 47580 6024 ? S Aug17 0:00
/usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/AmbariAgent.py start^M
root 39959 20.4 10.2 31623096 27154348 ? Sl Aug17 7645:55
/usr/bin/python /usr/lib/ambari-agent/lib/ambari_agent/main.py start^M
Just before the growth in memory usage is seen. This exception pops out:
ERROR 2018-09-11 10:56:59,716 websocket.py:552 - Websocket connection was
closed with an exception
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 549, in run
if not self.once():
File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 428, in once
if not self.process(self.buf[:requested]):
File "/usr/lib/ambari-agent/lib/ambari_ws4py/websocket.py", line 483, in
process
self.reading_buffer_size = s.parser.send(bytes) or DEFAULT_READING_SIZE
ValueError: generator already executing
This exception is not seen on all other nodes or on this one at any other
period (during 1 month). So I suggest it can be the root cause.
Basically this error means that generator is being used by multiple threads. So
I will upload the fix to thread-lock this place.
This is just a guess solution which might work and might not. No way to test
really. But definitely we should try this.
This is noticed in ambari-2.7.1.0-73 version as well.
[ Full content available at: https://github.com/apache/ambari/pull/2318 ]
This message was relayed via gitbox.apache.org for [email protected]