> then why don’t you handle the connection state as well? isn’t that a simple > fix?
VDSM socket availability during startup is probably the most important requirement for MOM and the whole service is based around that assumption. We could handle that differently, but letting the service crash saves us tons of code (as it should not happen in the first place). We simply do not need the code that would decide between a permanent and temporary failure during startup. XML-RPC was easy as it was stateless (new request for every call). JSON-RPC is a bit harder as it keeps the socket internally. Does the client reconnect by itself btw? What happens when we use it after a socket error? Martin On Fri, Nov 18, 2016 at 12:55 PM, Michal Skrivanek <[email protected]> wrote: > >> On 18 Nov 2016, at 12:35, Martin Sivak <[email protected]> wrote: >> >>> I don't think it is related to version X or Y. It is a race, so might be >>> related to other factors. >> >> It never (seriously: NEVER) happened with xml-rpc before 4.0.5. > > that is surprising > but we also didn’t have lago before;-) > >> >>> likely because json-rpc is initialized after xml-rpc….or indeed whatever >>> else;-) >> >> But this is not about jsonrpc. The socket itself is shared according >> to what Piotr said. > > it is > >> >>> btw you likely still want to have a retry in mom once it >>> starts responding due to delayed vdsm async recovery taking potentially >>> minutes >> >> We handle this already. The only issue is the connection refused state. > > then why don’t you handle the connection state as well? isn’t that a simple > fix? > >> >> >> Martin >> >> >> On Fri, Nov 18, 2016 at 12:19 PM, Michal Skrivanek >> <[email protected]> wrote: >>> >>> On 18 Nov 2016, at 12:12, Oved Ourfali <[email protected]> wrote: >>> >>> I don't think it is related to version X or Y. It is a race, so might be >>> related to other factors. >>> >>> >>> likely because json-rpc is initialized after xml-rpc….or indeed whatever >>> else;-) >>> >>> either way it needs to be solved. Either by improving the systemd service >>> file or mom retry (btw you likely still want to have a retry in mom once it >>> starts responding due to delayed vdsm async recovery taking potentially >>> minutes) >>> >>> >>> On Nov 18, 2016 12:59 PM, "Martin Sivak" <[email protected]> wrote: >>>> >>>>> Are we / can we use systemd socket activation there? >>>> >>>> That actually requires systemd specific code iirc (to take over the >>>> standing by socket). I am actually wondering why the xml-rpc in 4.0.4 >>>> was fine and json-rpc in 4.0.6 is too slow. >>>> >>>> Martin >>>> >>>> On Fri, Nov 18, 2016 at 11:53 AM, Anton Marchukov <[email protected]> >>>> wrote: >>>>> Hello All. >>>>> >>>>> Are we / can we use systemd socket activation there? >>>>> >>>>> Anton. >>>>> >>>>> On Fri, Nov 18, 2016 at 11:21 AM, Martin Sivak <[email protected]> >>>>> wrote: >>>>>> >>>>>> What about making vdsm ready to answer connections when it returns to >>>>>> systemd instead? I hate workarounds and this always worked fine. >>>>>> >>>>>> Martin >>>>>> >>>>>> On Fri, Nov 18, 2016 at 11:13 AM, Oved Ourfali <[email protected]> >>>>>> wrote: >>>>>>> Seems like a race regardless of the protocol. >>>>>>> Should you add a retry? >>>>>>> >>>>>>> >>>>>>> On Nov 18, 2016 11:52 AM, "Martin Sivak" <[email protected]> wrote: >>>>>>>> >>>>>>>> Yes, because VDSM is supposed to be up (there is systemd >>>>>>>> dependency). >>>>>>>> This always worked fine with xml-rpc. >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> On Fri, Nov 18, 2016 at 10:14 AM, Nir Soffer <[email protected]> >>>>>>>> wrote: >>>>>>>>> On Fri, Nov 18, 2016 at 10:45 AM, Martin Sivak <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> This happens because MOM can't connect to VDSM and so it quits. >>>>>>>>> >>>>>>>>> So mom try once to connect and if the connection fails it quits? >>>>>>>>> >>>>>>>>>> We >>>>>>>>>> discussed it on the mailinglist >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://lists.fedoraproject.org/archives/list/[email protected]/thread/MZ7UJUWO5KFRDJJDNXX7VIYU5PWSXF62/ >>>>>>>>>> http://lists.ovirt.org/pipermail/devel/2016-November/014101.html >>>>>>>>>> >>>>>>>>>> This issue never happened with XML-RPC. >>>>>>>>>> >>>>>>>>>> Shira reported it as >>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1393012 >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On Thu, Nov 17, 2016 at 7:42 PM, Yaniv Kaul <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> I've recently seen, including now on Master, the following >>>>>>>>>>> warnings: >>>>>>>>>>> Nov 17 13:33:25 lago-basic-suite-master-host0 systemd[1]: >>>>>>>>>>> Started >>>>>>>>>>> MOM >>>>>>>>>>> instance configured for VDSM purposes. >>>>>>>>>>> Nov 17 13:33:25 lago-basic-suite-master-host0 systemd[1]: >>>>>>>>>>> Starting >>>>>>>>>>> MOM >>>>>>>>>>> instance configured for VDSM purposes... >>>>>>>>>>> Nov 17 13:33:35 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, Policy could not be set. >>>>>>>>>>> Nov 17 13:33:39 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> Nov 17 13:33:39 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, KSM stats will be missing. >>>>>>>>>>> Nov 17 13:33:55 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> Nov 17 13:33:55 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, KSM stats will be missing. >>>>>>>>>>> Nov 17 13:34:10 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> Nov 17 13:34:10 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, KSM stats will be missing. >>>>>>>>>>> Nov 17 13:34:26 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> Nov 17 13:34:26 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, KSM stats will be missing. >>>>>>>>>>> Nov 17 13:34:42 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> Nov 17 13:34:42 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, KSM stats will be missing. >>>>>>>>>>> Nov 17 13:34:57 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> Nov 17 13:34:57 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available, KSM stats will be missing. >>>>>>>>>>> Nov 17 13:35:12 lago-basic-suite-master-host0 vdsm[2012]: vdsm >>>>>>>>>>> MOM >>>>>>>>>>> WARN MOM >>>>>>>>>>> not available. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Any ideas what this is and why? >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Devel mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>>>>>>> _______________________________________________ >>>>>>>>>> Devel mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>>>>> _______________________________________________ >>>>>>>> Devel mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>>>>> >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Devel mailing list >>>>>> [email protected] >>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Anton Marchukov >>>>> Senior Software Engineer - RHEV CI - Red Hat >>>>> >>> >>> _______________________________________________ >>> Devel mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> > _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
