On Tue, Feb 5, 2019 at 9:25 PM Nir Soffer <[email protected]> wrote:
> On Tue, Feb 5, 2019 at 7:15 PM Milan Zamazal <[email protected]> wrote: > >> Hi, I'm trying to run Vdsm with Python 3, which exposes various Python 3 >> incompatibilities in Vdsm code. I'm currently semi-stuck on Vdsm >> crashing very shortly after startup with: >> >> Panic: Error initializing IRS >> Traceback (most recent call last): >> File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 95, in >> serve_clients >> irs = Dispatcher(HSM()) >> File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", >> line 46, in __init__ >> self._exposeFunctions(obj) >> File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", >> line 56, in _exposeFunctions >> if hasattr(funcObj, _EXPORTED_ATTRIBUTE) and callable(funcObj): >> File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 85, >> in __getattr__ >> raise se.StoragePoolNotConnected >> vdsm.storage.exception.StoragePoolNotConnected: Storage pool not >> connected: () >> > > This should be fixed first. exposing HSM function should not raise this > error. > Can you check if https://gerrit.ovirt.org/c/97590/ fixes this issue? > The best way would be to write tests for the dispatcher and run the tests > with python 3. > When the dispatcher is ported to python 3 you can try again to start vdsm. > > Even if you port entire vdsm to python 3, sanlock does not ship yet > python 3 bindings, so you cannot have any storage except local storage, > iso domain and export domain. > > That wouldn't be a problem by itself, errors are expected in Python 3 >> experiments. However the problem is that this is often the *only* error >> I receive. >> >> I've found out that HSM() call spawns threads, which apparently don't >> have chance to report their errors before the panic above kills Vdsm. >> For instance, there was a string-byte mismatch in running iscsiadm >> scanning commands. Only after I added time.sleep to the end of >> HSM.__init__, I got an error + traceback from that and could see what's >> wrong. That worries me a bit, because it might mean that if some >> initialization external commands are not fast enough, Vdsm may not be >> able to start. >> > > The real fix for this issue to eliminate the storage startup code. > > Storage should do the minimal initialization needed (e.g. start monitoring > udev events) and wait for engine. When engine ask to connect to storage, > vdsm should the needed scanning and lvm initialization that is done now > during startup. > > Then if some storage code fail, only connecting to this kind of storage > would fail and vdsm will not panic. > > As temporary solution you can disable the storage refresh thread > started in HSM.__init__, and set HSM.ready to True immediately. > This will have similar effect to the long term fix. > > After fixing some problems, I get the panic again, even with the sleep >> at the end of HSM.__init__, and again without any other error. It's >> difficult to proceed in such a situation -- any tips what to look at or >> to look for are welcome. >> > > You can disable irs in the config, but then you will have no storage. > Maybe this is good enough at this point. > > Nir > > >> >> FWIW, my vdsm.log now ends with >> >> INFO (MainThread) [storage.check] Starting check service (check:91) >> DEBUG (check/loop) [root] START thread <Thread(check/loop, started >> daemon 140180434425600)> (func=<bound method EventLoop.run_forever of >> <EventLoop running=True closed=False at 0x140181096264704>>, args=(), >> kwargs={}) (concurrent:193) >> INFO (check/loop) [storage.asyncevent] Starting <EventLoop >> running=True closed=False at 0x140181096264704> (asyncevent:125) >> DEBUG (hsm/init) [storage.SamplingMethod] Returning last result >> (misc:386) >> DEBUG (hsm/init) [storage.SamplingMethod] Trying to enter sampling >> method (vdsm.storage.hba.<function rescan at 0x7f7e7a597e18>) (misc:376) >> DEBUG (hsm/init) [storage.SamplingMethod] Got in to sampling method >> (misc:379) >> DEBUG (hsm/init) [storage.HBA] Starting scan (hba:58) >> DEBUG (hsm/init) [storage.HBA] Scan finished (hba:64) >> DEBUG (hsm/init) [storage.SamplingMethod] Returning last result >> (misc:386) >> DEBUG (hsm/init) [root] /usr/bin/taskset --cpu-list 0-0 /sbin/udevadm >> settle --timeout=5 (cwd None) (commands:200) >> DEBUG (hsm/init) [root] SUCCESS: <err> = b''; <rc> = 0 (commands:221) >> DEBUG (hsm/init) [storage.SamplingMethod] Returning last result >> (misc:386) >> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 >> /usr/bin/sudo -n /sbin/lvm pvs --config 'devices { >> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global >> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings >> --units b --nosuffix --separator '|' --ignoreskippedcluster -o >> uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count >> (cwd None) (commands:200) >> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0 >> (commands:221) >> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 >> /usr/bin/sudo -n /sbin/lvm vgs --config 'devices { >> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global >> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings >> --units b --nosuffix --separator '|' --ignoreskippedcluster -o >> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >> (cwd None) (commands:200) >> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0 >> (commands:221) >> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 >> /usr/bin/sudo -n /sbin/lvm lvs --config 'devices { >> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global >> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings >> --units b --nosuffix --separator '|' --ignoreskippedcluster -o >> uuid,name,vg_name,attr,size,seg_start_pe,devices,tags (cwd None) >> (commands:200) >> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0 >> (commands:221) >> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 >> /usr/bin/sudo -n /sbin/lvm vgs --config 'devices { >> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global >> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings >> --units b --nosuffix --separator '|' --ignoreskippedcluster -o >> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >> (cwd None) (commands:200) >> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0 >> (commands:221) >> INFO (hsm/init) [storage.HSM] FINISH HSM init succeeded in 0.43 >> seconds (hsm:410) >> DEBUG (hsm/init) [storage.HSM] FINISH thread <Thread(hsm/init, stopped >> daemon 140180442818304)> (concurrent:196) >> >> and is completely silent after that. >> >> Thanks, >> Milan >> >
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/LIHPBTPAEI6WZCC6T4A47AT2AOLMIXQ5/
