On Tue, Feb 5, 2019 at 9:25 PM Nir Soffer <[email protected]> wrote:

> On Tue, Feb 5, 2019 at 7:15 PM Milan Zamazal <[email protected]> wrote:
>
>> Hi, I'm trying to run Vdsm with Python 3, which exposes various Python 3
>> incompatibilities in Vdsm code.  I'm currently semi-stuck on Vdsm
>> crashing very shortly after startup with:
>>
>>   Panic: Error initializing IRS
>>   Traceback (most recent call last):
>>     File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 95, in
>> serve_clients
>>       irs = Dispatcher(HSM())
>>     File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
>> line 46, in __init__
>>       self._exposeFunctions(obj)
>>     File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
>> line 56, in _exposeFunctions
>>       if hasattr(funcObj, _EXPORTED_ATTRIBUTE) and callable(funcObj):
>>     File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 85,
>> in __getattr__
>>       raise se.StoragePoolNotConnected
>>   vdsm.storage.exception.StoragePoolNotConnected: Storage pool not
>> connected: ()
>>
>
> This should be fixed first. exposing HSM function should not raise this
> error.
>

Can you check if https://gerrit.ovirt.org/c/97590/ fixes this issue?


> The best way would be to write tests for the dispatcher and run the tests
> with python 3.
> When the dispatcher is ported to python 3 you can try again to start vdsm.
>
> Even if you port entire vdsm to python 3, sanlock does not ship yet
> python 3 bindings, so you cannot have any storage except local storage,
> iso domain and export domain.
>
> That wouldn't be a problem by itself, errors are expected in Python 3
>> experiments.  However the problem is that this is often the *only* error
>> I receive.
>>
>> I've found out that HSM() call spawns threads, which apparently don't
>> have chance to report their errors before the panic above kills Vdsm.
>> For instance, there was a string-byte mismatch in running iscsiadm
>> scanning commands.  Only after I added time.sleep to the end of
>> HSM.__init__, I got an error + traceback from that and could see what's
>> wrong.  That worries me a bit, because it might mean that if some
>> initialization external commands are not fast enough, Vdsm may not be
>> able to start.
>>
>
> The real fix for this issue to eliminate the storage startup code.
>
> Storage should do the minimal initialization needed (e.g. start monitoring
> udev events) and wait for engine. When engine ask to connect to storage,
> vdsm should the needed scanning and lvm initialization that is done now
> during startup.
>
> Then if some storage code fail, only connecting to this kind of storage
> would fail and vdsm will not panic.
>
> As temporary solution you can disable the storage refresh thread
> started in HSM.__init__, and set HSM.ready to True immediately.
> This will have similar effect to the long term fix.
>
> After fixing some problems, I get the panic again, even with the sleep
>> at the end of HSM.__init__, and again without any other error.  It's
>> difficult to proceed in such a situation -- any tips what to look at or
>> to look for are welcome.
>>
>
> You can disable irs in the config, but then you will have no storage.
> Maybe this is good enough at this point.
>
> Nir
>
>
>>
>> FWIW, my vdsm.log now ends with
>>
>>   INFO  (MainThread) [storage.check] Starting check service (check:91)
>>   DEBUG (check/loop) [root] START thread <Thread(check/loop, started
>> daemon 140180434425600)> (func=<bound method EventLoop.run_forever of
>> <EventLoop running=True closed=False at 0x140181096264704>>, args=(),
>> kwargs={}) (concurrent:193)
>>   INFO  (check/loop) [storage.asyncevent] Starting <EventLoop
>> running=True closed=False at 0x140181096264704> (asyncevent:125)
>>   DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
>> (misc:386)
>>   DEBUG (hsm/init) [storage.SamplingMethod] Trying to enter sampling
>> method (vdsm.storage.hba.<function rescan at 0x7f7e7a597e18>) (misc:376)
>>   DEBUG (hsm/init) [storage.SamplingMethod] Got in to sampling method
>> (misc:379)
>>   DEBUG (hsm/init) [storage.HBA] Starting scan (hba:58)
>>   DEBUG (hsm/init) [storage.HBA] Scan finished (hba:64)
>>   DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
>> (misc:386)
>>   DEBUG (hsm/init) [root] /usr/bin/taskset --cpu-list 0-0 /sbin/udevadm
>> settle --timeout=5 (cwd None) (commands:200)
>>   DEBUG (hsm/init) [root] SUCCESS: <err> = b''; <rc> = 0 (commands:221)
>>   DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
>> (misc:386)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
>> /usr/bin/sudo -n /sbin/lvm pvs --config 'devices {
>> preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1
>> write_cache_state=0  disable_after_error_count=3  filter=["r|.*|"] } global
>> {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>> use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings
>> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>> uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count
>> (cwd None) (commands:200)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0
>> (commands:221)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
>> /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
>> preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1
>> write_cache_state=0  disable_after_error_count=3  filter=["r|.*|"] } global
>> {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>> use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings
>> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>> (cwd None) (commands:200)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0
>> (commands:221)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
>> /usr/bin/sudo -n /sbin/lvm lvs --config 'devices {
>> preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1
>> write_cache_state=0  disable_after_error_count=3  filter=["r|.*|"] } global
>> {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>> use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings
>> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>> uuid,name,vg_name,attr,size,seg_start_pe,devices,tags (cwd None)
>> (commands:200)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0
>> (commands:221)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
>> /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
>> preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1
>> write_cache_state=0  disable_after_error_count=3  filter=["r|.*|"] } global
>> {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>> use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings
>> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>> (cwd None) (commands:200)
>>   DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> = 0
>> (commands:221)
>>   INFO  (hsm/init) [storage.HSM] FINISH HSM init succeeded in 0.43
>> seconds (hsm:410)
>>   DEBUG (hsm/init) [storage.HSM] FINISH thread <Thread(hsm/init, stopped
>> daemon 140180442818304)> (concurrent:196)
>>
>> and is completely silent after that.
>>
>> Thanks,
>> Milan
>>
>
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/LIHPBTPAEI6WZCC6T4A47AT2AOLMIXQ5/

Reply via email to