On Mon, Feb 8, 2021 at 6:47 PM Asaf Rachmani <[email protected]> wrote: > > > I didn't check it yet, but maybe this customer bug is related to this > discussion: > https://bugzilla.redhat.com/show_bug.cgi?id=1666553
This bug is about the broker opening too many connections to vdsm: https://bugzilla.redhat.com/show_bug.cgi?id=1666553#c1 If the broker keeps 1000's of connections open it it likey that vdsm will run out of fds. > From vdsm.log: > 2019-01-15 13:41:11,162+0000 INFO (periodic/2) [vdsm.api] FINISH > multipath_health return={} from=internal, > task_id=97c359aa-002e-46d8-9fc5-2477db0909b4 (api:52) > 2019-01-15 13:41:12,210+0000 WARN (vdsm.Scheduler) [Executor] Worker > blocked: <Worker name=jsonrpc/0 running <Task <JsonRpcTask {'params': {}, > 'jsonrpc': '2.0', 'method': u'Host.getCapabilities', 'id': > u'74b9dc62-22b2-4698-9d84-6a71c4f29763'} at 0x7f71dc31b0d0> timeout=60, > duration=60 at 0x7f71dc31b110> task#=33 at 0x7f722003c890>, traceback: > File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap > self.__bootstrap_inner() > File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner > self.run() > File: "/usr/lib64/python2.7/threading.py", line 765, in run > self.__target(*self.__args, **self.__kwargs) > File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, > in run > ret = func(*args, **kwargs) > File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run > self._execute_task() > File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in > _execute_task > task() > File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in > __call__ > self._callable() > File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in > __call__ > self._handler(self._ctx, self._req) > File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in > _serveRequest > response = self._handle_request(req, ctx) > File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in > _handle_request > res = method(**params) > File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in > _dynamicMethod > result = fn(*methodArgs) > File: "<string>", line 2, in getCapabilities > File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in > method > ret = func(*args, **kwargs) > File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1337, in > getCapabilities > c = caps.get() > File: "/usr/lib/python2.7/site-packages/vdsm/host/caps.py", line 168, in get > net_caps = supervdsm.getProxy().network_caps() > File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, > in __call__ > return callMethod() > File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, > in <lambda> > **kwargs) > File: "<string>", line 2, in network_caps > File: "/usr/lib64/python2.7/multiprocessing/b", line 759, in _callmethod > kind, result = conn.recv() (executor:363) > > > > > On Mon, Feb 8, 2021 at 5:52 PM Nir Soffer <[email protected]> wrote: >> >> On Mon, Feb 8, 2021 at 1:22 PM Yedidyah Bar David <[email protected]> wrote: >> > >> > On Mon, Feb 8, 2021 at 9:05 AM Yedidyah Bar David <[email protected]> wrote: >> > > >> > > Hi all, >> > > >> > > I ran a loop of [1] (from [2]). The loop succeeded for ~ 380 >> > > iterations, then failed with 'Too many open files'. First failure was: >> > > >> > > 2021-02-08 02:21:15,702+0100 ERROR (jsonrpc/4) [storage.HSM] Could not >> > > connect to storageServer (hsm:2446) >> > > Traceback (most recent call last): >> > > File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line >> > > 2443, in connectStorageServer >> > > conObj.connect() >> > > File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", >> > > line 449, in connect >> > > return self._mountCon.connect() >> > > File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", >> > > line 171, in connect >> > > self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) >> > > File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line >> > > 210, in mount >> > > cgroup=cgroup) >> > > File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", >> > > line 56, in __call__ >> > > return callMethod() >> > > File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", >> > > line 54, in <lambda> >> > > **kwargs) >> > > File "<string>", line 2, in mount >> > > File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, >> > > in _callmethod >> > > raise convert_to_error(kind, result) >> > > OSError: [Errno 24] Too many open files >> >> Maybe we have a fd leak in supervdsmd? >> >> We know that there a small memory leak in multiprocessing, but not >> about any fd leak. >> >> > > But obviously, once it did, it continued failing for this reason on >> > > many later operations. >> >> Smells like fd leak. >> >> > > Is this considered a bug? >> >> Generally yes, but the question is if this happens during >> real world scenarios. >> >> > Do we actively try to prevent such cases? >> >> No, we don't have any code monitoring number of open fds >> in runtime, or tests checking this in system tests. >> >> We do have health monitor in vdsm: >> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/health.py >> >> It can be useful to log monitor also the number of fds (.e.g ls -lh >> /proc/pid/fd). >> >> We don't have any monitor in supervdsm, it can be useful to add >> one. supervdsm is relatively simple, but the problem is it runs >> possibly complex code from vdsm, so "safe" changes in vdsm can >> cause trouble when the code is run by supervdsm. >> >> > So should I open one and attach logs? Or it can be considered a "corner >> > > case"? >> >> Yes, please open a bug, and include the info you have. >> >> Please include output of "ls -lh /proc/pid/fd" for both vdsm >> and supervdsm when you reproduce the issue, or during the >> long test if you cannot reproduce. >> >> > > Using vdsm-4.40.50.3-37.git7883b3b43.el8.x86_64 from >> > > ost-images-el8-he-installed-1-202102021144.x86_64 . >> > > >> > > I can also let access to the machine(s) if needed, for now. >> > >> > Sorry, now cleaned this env. Can try to reproduce if there is interest. >> >> It will help you can reproduce. >> >> Nir >> _______________________________________________ >> Devel mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/[email protected]/message/W7YKK25CHMNQAB4R2BMFL7PBOIPOMMBY/ _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/4VLDGENPQRIJAM5VJJ3ZG2V5ZYKAKUV5/
