Il 31/10/2014 10:26, Jaicel ha scritto: > i've increased the limit and then restarted agent and broker. status > normalize, but then right now it went to "False" state again but still both > having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead but > subsys locked" status. ha-broker logs below > > Thread-138::INFO::2014-10-31 > 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-138::INFO::2014-10-31 > 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > Thread-139::INFO::2014-10-31 > 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-139::INFO::2014-10-31 > 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > Thread-140::INFO::2014-10-31 > 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-140::INFO::2014-10-31 > 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > Thread-141::INFO::2014-10-31 > 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-141::INFO::2014-10-31 > 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > Thread-142::INFO::2014-10-31 > 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-142::INFO::2014-10-31 > 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > > Thanks, > Jaicel > > ----- Original Message ----- > From: "Jiri Moskovcak" <[email protected]> > To: "Jaicel R. Sabonsolin" <[email protected]>, "Niels de Vos" > <[email protected]> > Cc: "Vijay Bellur" <[email protected]>, [email protected], "Gluster Devel" > <[email protected]> > Sent: Friday, October 31, 2014 4:32:02 PM > Subject: Re: [ovirt-users] Hosted-Engine HA problem > > On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote: >> Hi guys, >> >> these logs appear on both hosts just like the result of --vm-status. tried >> to tcpdump on ovirt hosts and gluster nodes but only packets exchange with >> my monitoring VM(zabbix) appeared. >> >> agent.log >> new_data = self.refresh(self._state.data) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", >> line 77, in refresh >> stats.update(self.hosted_engine.collect_stats()) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 662, in collect_stats >> constants.SERVICE_TYPE) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 171, in get_stats_from_storage >> result = self._checked_communicate(request) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 199, in _checked_communicate >> .format(message or response)) >> RequestError: Request failed: <type 'exceptions.OSError'> >> >> broker.log >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", >> line 165, in handle >> response = "success " + self._dispatch(data) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", >> line 261, in _dispatch >> .get_all_stats_for_service_type(**options) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >> line 41, in get_all_stats_for_service_type >> d = self.get_raw_stats_for_service_type(storage_dir, service_type) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >> line 74, in get_raw_stats_for_service_type >> f = os.open(path, direct_flag | os.O_RDONLY) >> OSError: [Errno 24] Too many open files: >> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' > > - ah, there we go ^^^^^^ you might need to tweak the limit of allowed > open files as described here [1] or find the app keeps so many files open
It would be nice to understand if this is related to that host only or if this is a common case and we should increase the limit within setup. Never seen this issue before. > > > --Jirka > > [1] > http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ > >> Thread-38160::INFO::2014-10-31 >> 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >> Connection closed >> Thread-38161::INFO::2014-10-31 >> 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >> Connection established >> Thread-38161::ERROR::2014-10-31 >> 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >> Error handling request, data: 'get-stats >> storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent >> service_type=hosted-engine' >> Traceback (most recent call last): >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", >> line 165, in handle >> response = "success " + self._dispatch(data) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", >> line 261, in _dispatch >> .get_all_stats_for_service_type(**options) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >> line 41, in get_all_stats_for_service_type >> d = self.get_raw_stats_for_service_type(storage_dir, service_type) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >> line 74, in get_raw_stats_for_service_type >> f = os.open(path, direct_flag | os.O_RDONLY) >> OSError: [Errno 24] Too many open files: >> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' >> Thread-38161::INFO::2014-10-31 >> 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >> Connection closed >> >> Thanks, >> Jaicel >> >> ----- Original Message ----- >> From: "Niels de Vos" <[email protected]> >> To: "Vijay Bellur" <[email protected]> >> Cc: "Jiri Moskovcak" <[email protected]>, "Jaicel R. Sabonsolin" >> <[email protected]>, [email protected], "Gluster Devel" >> <[email protected]> >> Sent: Friday, October 31, 2014 4:11:25 AM >> Subject: Re: [ovirt-users] Hosted-Engine HA problem >> >> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote: >>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote: >>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote: >>>>> Hi Guys, >>>>> >>>>> I need help with my ovirt Hosted-Engine HA setup. I am running on 2 >>>>> ovirt hosts and 2 gluster nodes with replicated volumes. i already have >>>>> VMs running on my hosts and they can migrate normally once i for example >>>>> power off the host that they are running on. the problem is that the >>>>> engine can't migrate once i switch off the host that hosts the engine. >>>>> >>>>> oVirt 3.4.3-1.el6 >>>>> KVM 0.12.1.2 - 2.415.el6_5.10 >>>>> LIBVIRT libvirt-0.10.2-29.el6_5.9 >>>>> VDSM vdsm-4.14.17-0.el6 >>>>> >>>>> >>>>> right now, i have this result from hosted-engine --vm-status. >>>>> >>>>> File "/usr/lib64/python2.6/runpy.py", line 122, in >>>>> _run_module_as_main >>>>> "__main__", fname, loader, pkg_name) >>>>> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code >>>>> exec code in run_globals >>>>> File >>>>> >>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", >>>>> >>>>> line 111, in <module> >>>>> if not status_checker.print_status(): >>>>> File >>>>> >>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", >>>>> >>>>> line 58, in print_status >>>>> all_host_stats = ha_cli.get_all_host_stats() >>>>> File >>>>> >>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", >>>>> >>>>> line 137, in get_all_host_stats >>>>> return self.get_all_stats(self.StatModes.HOST) >>>>> File >>>>> >>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", >>>>> >>>>> line 86, in get_all_stats >>>>> constants.SERVICE_TYPE) >>>>> File >>>>> >>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>> >>>>> line 171, in get_stats_from_storage >>>>> result = self._checked_communicate(request) >>>>> File >>>>> >>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>> >>>>> line 199, in _checked_communicate >>>>> .format(message or response)) >>>>> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: >>>>> <type 'exceptions.OSError'> >>>>> >>>>> >>>>> restarting ha-broker and ha-agent normalizes the status but eventually >>>>> it would become "false" and then return to the result above. hope you >>>>> guys could help me with this. >>>>> >>>> >>>> Hi Jaicel, >>>> please attach agent.log and broker.log from the host where you trying to >>>> run hosted-engine --vm-status. I have a feeling that you ran into a >>>> known problem on gluster - stalled file descriptor, in that case the >>>> only known solution at this time is to restart the broker & agent as you >>>> have already found out. >>>> >>> >>> Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective. >> >> I'd welcome any details on this "stalled file descriptor" problem. Is >> there a bug filed with some details like logs, sysrq-t and maybe even >> tcpdumps? If there is an easy way to reproduce this behaviour, I can >> surely look into it and hopefully come up with some advise or fix. >> >> Thanks, >> Niels >> > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users > -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Gluster-devel mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-devel
