On Tue, Mar 10, 2020 at 8:14 PM Nir Soffer <[email protected]> wrote:
>
> On Tue, Mar 10, 2020 at 7:03 PM Amit Bawer <[email protected]> wrote:
> >
> > Seems like a reproduce of
> > https://bugzilla.redhat.com/show_bug.cgi?id=1807050#c1
>
> Agree, because...
>
> > Snipped from
> > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/21146/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-1/_var_log/vdsm/vdsm.log:
> >
> > 2020-03-10 05:59:18,549-0400 ERROR (jsonrpc/3) [storage.LVM] vg
> > cceb9d83-7b76-4840-a189-c82f3c18760e has pv_count 2 but pv_names
> > ('/dev/mapper/3600140544bef7e411164e5f94e13b5d8',) (lvm:578)
> > 2020-03-10 05:59:18,551-0400 INFO (jsonrpc/3) [storage.StorageDomain]
> > sdUUID=cceb9d83-7b76-4840-a189-c82f3c18760e (blockSD:1192)
> > 2020-03-10 05:59:18,551-0400 DEBUG (jsonrpc/3) [common.commands]
> > /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgck --config
> > 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
> > write_cache_state=0 disable_after_error_count=3
> > filter=["a|^/dev/mapper/3600140544bef7e411164e5f94e13b5d8$|", "r|.*|"]
> > hints="none" } global { locking_type=1 prioritise_write_locks=1
> > wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }'
> > cceb9d83-7b76-4840-a189-c82f3c18760e (cwd None) (commands:153)
> > 2020-03-10 05:59:18,634-0400 DEBUG (jsonrpc/3) [common.commands] FAILED:
> > <err> = b" WARNING: Couldn't find device with uuid
> > FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\n WARNING: VG
> > cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV
> > FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\n The volume group is missing 1
> > physical volumes.\n"; <rc> = 5 (commands:185)
> > 2020-03-10 05:59:18,637-0400 INFO (jsonrpc/3) [vdsm.api] FINISH
> > getStorageDomainInfo error=Domain is either partially accessible or
> > entirely inaccessible: ('cceb9d83-7b76-4840-a189-c82f3c18760e: [" WARNING:
> > Couldn\'t find device with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.",
> > \' WARNING: VG cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV
> > FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\', \' The volume group is missing
> > 1 physical volumes.\']',) from=::ffff:192.168.201.4,47796,
> > flow_id=5f02a1ec-db37-470d-b329-41b22f23582b,
> > task_id=9be86ca4-49ac-47ea-b0e2-8182e33924ff (api:52)
>
> This command was run only once. Usually when a command using specific filter
> (e.g. filter=["a|^/dev/mapper/3600140544bef7e411164e5f94e13b5d8$|", "r|.*|"])
> fails, we rebuild the filter. If the new filter is different (e.g has
> more devices) we
> run the command again.
>
> Since we ran the command only once we know that the filter is correct,
> so we have
> only /dev/mapper/3600140544bef7e411164e5f94e13b5d8 on the host. The other PV
> is not available when this command was run.
>
> We started the connection here:
>
> 2020-03-10 05:59:17,364-0400 DEBUG (jsonrpc/2) [common.commands]
> /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m
> node -T iqn.2014-07.org.ovirt:storage -I default -p
> 192.168.200.4:3260,1 -l (cwd None) (commands:153)
> 2020-03-10 05:59:17,504-0400 DEBUG (jsonrpc/2) [common.commands]
> SUCCESS: <err> = b''; <rc> = 0 (commands:98)
>
> And finished here:
>
> 2020-03-10 05:59:17,610-0400 DEBUG (jsonrpc/2) [common.commands]
> /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd
> None) (commands:153)
> 2020-03-10 05:59:17,787-0400 DEBUG (jsonrpc/2) [common.commands]
> SUCCESS: <err> = b''; <rc> = 0 (commands:98)
>
> In /var/log/message we see the connection starting here:
>
> Mar 10 05:59:17 lago-basic-suite-master-host-1 iscsid[21973]: iscsid:
> Connection2:0 to [target: iqn.2014-07.org.ovirt:storage, portal:
> 192.168.200.4,3260] through [iface: default] is operational now
>
> Adding devices:
>
> Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:0:
> [sdf] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
> Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:4:
> [sdg] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
> Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:3:
> [sdh] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
> Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:2:
> [sdi] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
> Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:1:
> [sdj] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
>
> Multipath adding devices to maps:
>
> Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sdb
> [8:16]: path added to devmap 36001405b39cc4e33bd24f35a81c0c140
> Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sdc
> [8:32]: path added to devmap 36001405a70a062950224fc985825aa0d
> Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sda
> [8:0]: path added to devmap 3600140559c49ea12b0d4dc1994ba4ef0
> Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sde
> [8:64]: path added to devmap 36001405f277c71b13814669926ffbae4
> Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sdi
> [8:128]: path added to devmap 3600140544bef7e411164e5f94e13b5d8 <<<
> This is probably the missing device
>
> So we need to wait for a while, until multipath handles all the devices.
>
> https://gerrit.ovirt.org/c/107206/ should avoid this issue.
Merged now. We should not see this issue now.
> Benny, please try to run OST.
>
> > 2020-03-10 05:59:18,637-0400 ERROR (jsonrpc/3) [storage.TaskManager.Task]
> > (Task='9be86ca4-49ac-47ea-b0e2-8182e33924ff') Unexpected error (task:880)
> > Traceback (most recent call last):
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887,
> > in _run
> > return fn(*args, **kargs)
> > File "<decorator-gen-129>", line 2, in getStorageDomainInfo
> > File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in
> > method
> > ret = func(*args, **kwargs)
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2752,
> > in getStorageDomainInfo
> > dom = self.validateSdUUID(sdUUID)
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 310, in
> > validateSdUUID
> > sdDom.validate()
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line
> > 1193, in validate
> > lvm.chkVG(self.sdUUID)
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1278,
> > in chkVG
> > raise se.StorageDomainAccessError("%s: %s" % (vgName, err))
> > vdsm.storage.exception.StorageDomainAccessError: Domain is either partially
> > accessible or entirely inaccessible:
> > ('cceb9d83-7b76-4840-a189-c82f3c18760e: [" WARNING: Couldn\'t find device
> > with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.", \' WARNING: VG
> > cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV
> > FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\', \' The volume group is missing
> > 1 physical volumes.\']',)
> > 2020-03-10 05:59:18,637-0400 INFO (jsonrpc/3) [storage.TaskManager.Task]
> > (Task='9be86ca4-49ac-47ea-b0e2-8182e33924ff') aborting: Task is aborted:
> > 'value=Domain is either partially accessible or entirely inaccessible:
> > (\'cceb9d83-7b76-4840-a189-c82f3c18760e: [" WARNING: Couldn\\\'t find
> > device with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.", \\\' WARNING:
> > VG cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV
> > FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\\\', \\\' The volume group is
> > missing 1 physical volumes.\\\']\',) abortedcode=379' (task:1190)
> > 2020-03-10 05:59:18,638-0400 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH
> > getStorageDomainInfo error=Domain is either partially accessible or
> > entirely inaccessible: ('cceb9d83-7b76-4840-a189-c82f3c18760e: [" WARNING:
> > Couldn\'t find device with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.",
> > \' WARNING: VG cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV
> > FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\', \' The volume group is missing
> > 1 physical volumes.\']',) (dispatcher:83)
> >
> >
> > Suggest to try again once the BZ is fixed on master.
> >
> > On Tue, Mar 10, 2020 at 1:36 PM Yedidyah Bar David <[email protected]> wrote:
> > >
> > > Hi all,
> > >
> > > Anyone looking at this?
> > >
> > > See e.g.:
> > >
> > > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/21146/
> > >
> > > Thanks,
> > > --
> > > Didi
> > > _______________________________________________
> > > Devel mailing list -- [email protected]
> > > To unsubscribe send an email to [email protected]
> > > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > > oVirt Code of Conduct:
> > > https://www.ovirt.org/community/about/community-guidelines/
> > > List Archives:
> > > https://lists.ovirt.org/archives/list/[email protected]/message/ED57V5XW4B3WC7AM5GRYDE6CJJL7PWPM/
> > _______________________________________________
> > Devel mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> > https://lists.ovirt.org/archives/list/[email protected]/message/GFWZBWE3UT4OCB2GDJ7WPOG62TIKSU43/
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/[email protected]/message/4VB7YMBOZYXP3T5OS25L3HK6XAJUJ2XL/