Yes, I am! [root@osd01 ~]# uname -a Linux osd01.tor.medavail.net 4.18.11-1.el7.elrepo.x86_64
[root@osd03 latest]# uname -a Linux osd03.tor.medavail.net 4.18.11-1.el7.elrepo.x86_64 On Wed, 10 Oct 2018 at 16:22, Jason Dillaman <[email protected]> wrote: > Are you running the same kernel version on both nodes? > On Wed, Oct 10, 2018 at 4:18 PM Steven Vacaroaia <[email protected]> wrote: > > > > so, it seems OSD03 is having issues when creating disks ( I can create > target and hosts ) - here is an excerpt from api.log > > Please note I can create disk on the other node > > > > 2018-10-10 16:03:03,369 DEBUG [lun.py:381:allocate()] - LUN.allocate > starting, listing rbd devices > > 2018-10-10 16:03:03,381 DEBUG [lun.py:384:allocate()] - rados pool > 'rbd' contains the following - [u'datastoretest', u'datastoretest2', > u'disk1', u'dstest', u'scstimage1', u'test2'] > > 2018-10-10 16:03:03,382 DEBUG [lun.py:389:allocate()] - Hostname > Check - this host is osd03, target host for allocations is osd03 > > 2018-10-10 16:03:03,428 DEBUG [common.py:272:add_item()] - > (Config.add_item) config updated to {u'updated': u'2018/10/10 20:02:24', > u'disks': {'rbd.dstest2': {'created': '2018/10/10 20:03:03'}}, u'created': > u'2018/10/10 19:56:05', u'clients': {}, u'epoch': 4, u'version': 3, > u'gateways': {u'osd03': {u'gateway_ip_list': [u'10.10.30.183', > u'10.10.30.181'], u'active_luns': 0, u'created': u'2018/10/10 20:02:12', > u'updated': u'2018/10/10 20:02:24', u'iqn': > u'iqn.2003-01.com.redhat.iscsi-gw:iscsi-ceph', u'inactive_portal_ips': > [u'10.10.30.181'], u'portal_ip_address': u'10.10.30.183', u'tpgs': 2}, > u'osd01': {u'gateway_ip_list': [u'10.10.30.183', u'10.10.30.181'], > u'active_luns': 0, u'created': u'2018/10/10 20:02:24', u'updated': > u'2018/10/10 20:02:24', u'iqn': > u'iqn.2003-01.com.redhat.iscsi-gw:iscsi-ceph', u'inactive_portal_ips': > [u'10.10.30.183'], u'portal_ip_address': u'10.10.30.181', u'tpgs': 2}, > u'iqn': u'iqn.2003-01.com.redhat.iscsi-gw:iscsi-ceph', u'ip_list': > [u'10.10.30.183', u'10.10.30.181'], u'created': u'2018/10/10 20:01:39'}, > u'groups': {}} > > 2018-10-10 16:03:03,429 INFO [lun.py:405:allocate()] - > (LUN.allocate) created rbd/dstest2 successfully > > 2018-10-10 16:03:03,429 DEBUG [lun.py:444:allocate()] - Check the rbd > image size matches the request > > 2018-10-10 16:03:03,429 DEBUG [lun.py:467:allocate()] - Begin > processing LIO mapping > > 2018-10-10 16:03:03,429 INFO [lun.py:656:add_dev_to_lio()] - > (LUN.add_dev_to_lio) Adding image 'rbd.dstest2' to LIO > > 2018-10-10 16:03:03,429 DEBUG [lun.py:666:add_dev_to_lio()] - > control="max_data_area_mb=8" > > 2018-10-10 16:03:03,438 INFO [_internal.py:87:_log()] - 127.0.0.1 - > - [10/Oct/2018 16:03:03] "PUT /api/_disk/rbd.dstest2 HTTP/1.1" 500 - > > 2018-10-10 16:03:03,439 ERROR [rbd-target-api:1810:call_api()] - > _disk change on 127.0.0.1 failed with 500 > > 2018-10-10 16:03:03,439 INFO [_internal.py:87:_log()] - 127.0.0.1 - > - [10/Oct/2018 16:03:03] "PUT /api/disk/rbd.dstest2 HTTP/1.1" 500 - > > > > > > I remove gateway.conf object and install latest rpms on it as follows > but the error persist > > Alos rebooted the server > > > > ceph-iscsi-cli-2.7-54.g9b18a3b.el7.noarch.rpm > > python2-kmod-0.9-20.fc29.x86_64.rpm > > python2-rtslib-2.1.fb67-3.fc28.noarch.rpm > > ceph-iscsi-config-2.6-42.gccca57d.el7.noarch.rpm > > python2-pyudev-0.21.0-8.fc29.noarch.rpm > > tcmu-runner-1.4.0-1.el7.x86_64.rpm > > > > > > > > On Wed, 10 Oct 2018 at 13:52, Mike Christie <[email protected]> wrote: > >> > >> On 10/10/2018 08:21 AM, Steven Vacaroaia wrote: > >> > Hi Jason, > >> > Thanks for your prompt responses > >> > > >> > I have used same iscsi-gateway.cfg file - no security changes - just > >> > added prometheus entry > >> > There is no iscsi-gateway.conf but the gateway.conf object is created > >> > and has correct entries > >> > > >> > iscsi-gateway.cfg is identical and contains the following > >> > > >> > [config] > >> > cluster_name = ceph > >> > gateway_keyring = ceph.client.admin.keyring > >> > api_secure = false > >> > trusted_ip_list = > >> > > 10.10.30.181,10.10.30.182,10.10.30.183,10.10.30.184,10.10.30.185,10.10.30.186 > >> > prometheus_host = 0.0.0.0 > >> > > >> > > >> > > >> > I am running the disks commands from OSD01 and they fail with the > following > >> > > >> > INFO [gateway.py:344:load_config()] - (Gateway.load_config) > successfully > >> > loaded existing target definition > >> > 2018-10-10 09:04:48,956 DEBUG [gateway.py:423:map_luns()] - > >> > processing tpg2 > >> > 2018-10-10 09:04:48,956 DEBUG [gateway.py:428:map_luns()] - > >> > rbd.dstest needed mapping to tpg2 > >> > 2018-10-10 09:04:48,958 INFO > >> > [gateway.py:403:bind_alua_group_to_lun()] - Setup group ao for > >> > rbd.dstest on tpg 2 (state 0, owner True, failover type 1) > >> > 2018-10-10 09:04:48,958 DEBUG > >> > [gateway.py:405:bind_alua_group_to_lun()] - Setting Luns tg_pt_gp to > ao > >> > 2018-10-10 09:04:48,959 DEBUG > >> > [gateway.py:409:bind_alua_group_to_lun()] - Bound rbd.dstest on tpg2 > to ao > >> > 2018-10-10 09:04:48,959 DEBUG [gateway.py:423:map_luns()] - > >> > processing tpg1 > >> > 2018-10-10 09:04:48,959 DEBUG [gateway.py:428:map_luns()] - > >> > rbd.dstest needed mapping to tpg1 > >> > 2018-10-10 09:04:48,960 INFO > >> > [gateway.py:403:bind_alua_group_to_lun()] - Setup group ano1 for > >> > rbd.dstest on tpg 1 (state 1, owner False, failover type 1) > >> > 2018-10-10 09:04:48,960 DEBUG > >> > [gateway.py:405:bind_alua_group_to_lun()] - Setting Luns tg_pt_gp to > ano1 > >> > 2018-10-10 09:04:48,961 DEBUG > >> > [gateway.py:409:bind_alua_group_to_lun()] - Bound rbd.dstest on tpg1 > to ano1 > >> > 2018-10-10 09:04:48,963 INFO [_internal.py:87:_log()] - 127.0.0.1 > - > >> > - [10/Oct/2018 09:04:48] "PUT /api/_disk/rbd.dstest HTTP/1.1" 200 - > >> > 2018-10-10 09:04:48,965 INFO [rbd-target-api:1804:call_api()] - > >> > _disk update on 127.0.0.1, successful > >> > 2018-10-10 09:04:48,965 DEBUG [rbd-target-api:1789:call_api()] - > >> > processing GW 'osd03' > >> > 2018-10-10 09:04:49,039 ERROR [rbd-target-api:1810:call_api()] - > >> > _disk change on osd03 failed with 500 > >> > 2018-10-10 09:04:49,041 INFO [_internal.py:87:_log()] - 127.0.0.1 > - > >> > - [10/Oct/2018 09:04:49] "PUT /api/disk/rbd.dstest HTTP/1.1" 500 - > >> > > >> > > >> > on OSD03 there is the folowing "error" > >> > > >> > INFO [lun.py:656:add_dev_to_lio()] - (LUN.add_dev_to_lio) Adding > image > >> > 'rbd.dstest' to LIO > >> > 2018-10-10 09:04:49,037 DEBUG [lun.py:666:add_dev_to_lio()] - > >> > control="max_data_area_mb=8" > >> > > >> > Amazingly enough, gwcli on OSD03 show the disk created but on OSD01 it > >> > does not > >> > If I restart gwcli on OSD01 , disk is there but it cannot be added to > >> > the host because it image does not exist ??? > >> > >> What is the output of > >> > >> systemctl status rbd-target-api > >> systemctl status rbd-target-gw > >> > >> Is api in a failed state or does it indicate it has been crashing and > >> restarting? > >> > >> Does /var/log/messages show that rbd-target-api is crashing and > >> restarting and could you attach the stack trace? The > >> /var/log/rbd-target-api log will show > >> > >> Does > >> > >> gwcli ls > >> > >> show it cannot reach the remote gateways? > >> > >> > >> > > >> > adding the disk to the hosts failed with "client masking update" > error > >> > > >> > disk add rbd.dstest > >> > CMD: ../hosts/<client_iqn> disk action=add disk=rbd.dstest > >> > Client 'iqn.1998-01.com.vmware:test-2d06960a' update - add disk > rbd.dstest > >> > disk add for 'rbd.dstest' against iqn.1998-01.com.vmware:test-2d06960a > >> > failed > >> > client masking update failed on osd03. Client update failed > >> > > >> > rbd-target-api:1216:_update_client()] - client update failed on > >> > iqn.1998-01.com.vmware:test-2d06960a : Non-existent images > >> > ['rbd.dstest'] requested for iqn.1998-01.com.vmware:test-2d06960a > >> > > >> > However, the image is listed on gwcli and using rados ls > >> > > >> > /disks> ls > >> > o- disks > >> > > .......................................................................................................... > >> > [150G, Disks: 1] > >> > o- rbd.dstest > >> > > .................................................................................................... > >> > [dstest (150G)] > >> > > >> > rados -p rbd ls | grep dstest > >> > rbd_id.dstest > >> > > >> > > >> > > >> > I would really appreciate any help / suggestions > >> > > >> > Thanks > >> > Steven > >> > > >> > On Tue, 9 Oct 2018 at 16:35, Jason Dillaman <[email protected] > >> > <mailto:[email protected]>> wrote: > >> > > >> > Anything in the rbd-target-api.log on osd03 to indicate why it > failed? > >> > > >> > Since you replaced your existing "iscsi-gateway.conf", do your > >> > security settings still match between the two hosts (i.e. on the > >> > trusted_ip_list, same api_XYZ options)? > >> > On Tue, Oct 9, 2018 at 4:25 PM Steven Vacaroaia <[email protected] > >> > <mailto:[email protected]>> wrote: > >> > > > >> > > so the gateways are up but I have issues adding disks ( i.e. if > I > >> > do it on one gatway it does not show on the other - however, > after I > >> > restart the rbd-target services I am seeing the disks ) > >> > > Thanks in advance for taking the trouble to provide advice / > guidance > >> > > > >> > > 2018-10-09 16:16:08,968 INFO > [rbd-target-api:1804:call_api()] > >> > - _clientlun update on 127.0.0.1, successful > >> > > 2018-10-09 16:16:08,968 DEBUG > [rbd-target-api:1789:call_api()] > >> > - processing GW 'osd03' > >> > > 2018-10-09 16:16:08,987 ERROR > [rbd-target-api:1810:call_api()] > >> > - _clientlun change on osd03 failed with 500 > >> > > 2018-10-09 16:16:08,987 DEBUG > [rbd-target-api:1827:call_api()] > >> > - failed on osd03, applied to 127.0.0.1, aborted osd03. Client > >> > update failed > >> > > 2018-10-09 16:16:08,987 INFO [_internal.py:87:_log()] - > >> > 127.0.0.1 - - [09/Oct/2018 16:16:08] "PUT > >> > /api/clientlun/iqn.1998-01.com.vmware:test-2d06960a HTTP/1.1" 500 > - > >> > > > >> > > On Tue, 9 Oct 2018 at 15:42, Steven Vacaroaia <[email protected] > >> > <mailto:[email protected]>> wrote: > >> > >> > >> > >> It worked. > >> > >> > >> > >> many thanks > >> > >> Steven > >> > >> > >> > >> On Tue, 9 Oct 2018 at 15:36, Jason Dillaman < > [email protected] > >> > <mailto:[email protected]>> wrote: > >> > >>> > >> > >>> Can you try applying [1] and see if that resolves your issue? > >> > >>> > >> > >>> [1] https://github.com/ceph/ceph-iscsi-config/pull/78 > >> > >>> On Tue, Oct 9, 2018 at 3:06 PM Steven Vacaroaia > >> > <[email protected] <mailto:[email protected]>> wrote: > >> > >>> > > >> > >>> > Thanks Jason > >> > >>> > > >> > >>> > adding prometheus_host = 0.0.0.0 to iscsi-gateway.cfg does > not > >> > work - the error message is > >> > >>> > > >> > >>> > "..rbd-target-gw: ValueError: invalid literal for int() with > >> > base 10: '0.0.0.0' " > >> > >>> > > >> > >>> > adding prometheus_exporter = false works > >> > >>> > > >> > >>> > However I'd like to use prometheus_exporter if possible > >> > >>> > Any suggestions will be appreciated > >> > >>> > > >> > >>> > Steven > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > On Tue, 9 Oct 2018 at 14:25, Jason Dillaman > >> > <[email protected] <mailto:[email protected]>> wrote: > >> > >>> >> > >> > >>> >> You can try adding "prometheus_exporter = false" in your > >> > >>> >> "/etc/ceph/iscsi-gateway.cfg"'s "config" section if you > >> > aren't using > >> > >>> >> "cephmetrics", or try setting "prometheus_host = 0.0.0.0" > >> > since it > >> > >>> >> sounds like you have the IPv6 stack disabled. > >> > >>> >> > >> > >>> >> [1] > >> > > https://github.com/ceph/ceph-iscsi-config/blob/master/ceph_iscsi_config/settings.py#L90 > >> > >>> >> On Tue, Oct 9, 2018 at 2:09 PM Steven Vacaroaia > >> > <[email protected] <mailto:[email protected]>> wrote: > >> > >>> >> > > >> > >>> >> > here is some info from /var/log/messages ..in case > someone > >> > has the time to take a look > >> > >>> >> > > >> > >>> >> > Oct 9 13:58:35 osd03 systemd: Started Setup system to > >> > export rbd images through LIO. > >> > >>> >> > Oct 9 13:58:35 osd03 systemd: Starting Setup system to > >> > export rbd images through LIO... > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Processing osd blacklist > >> > entries for this node > >> > >>> >> > Oct 9 13:58:35 osd03 journal: No OSD blacklist entries > found > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Reading the configuration > >> > object to update local LIO configuration > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Configuration does not > have > >> > an entry for this host(osd03) - nothing to define to LIO > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Integrated Prometheus > >> > exporter is enabled > >> > >>> >> > Oct 9 13:58:35 osd03 journal: * Running on http:// > [::]:9287/ > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Removing iSCSI target > from LIO > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Removing LUNs from LIO > >> > >>> >> > Oct 9 13:58:35 osd03 journal: Active Ceph iSCSI gateway > >> > configuration removed > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: Traceback (most > recent > >> > call last): > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/bin/rbd-target-gw", line 5, in <module> > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: > >> > pkg_resources.run_script('ceph-iscsi-config==2.6', > 'rbd-target-gw') > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in > >> > run_script > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: > >> > self.require(requires)[0].run_script(script_name, ns) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1462, in > >> > run_script > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: exec_(script_code, > >> > namespace, namespace) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/pkg_resources.py", line 41, in > exec_ > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: exec("""exec code in > >> > globs, locs""") > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File "<string>", > line > >> > 1, in <module> > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > > > "/usr/lib/python2.7/site-packages/ceph_iscsi_config-2.6-py2.7.egg/EGG-INFO/scripts/rbd-target-gw", > >> > line 432, in <module> > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > > > "/usr/lib/python2.7/site-packages/ceph_iscsi_config-2.6-py2.7.egg/EGG-INFO/scripts/rbd-target-gw", > >> > line 379, in main > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/flask/app.py", line 772, in run > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: run_simple(host, > port, > >> > self, **options) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 710, > in > >> > run_simple > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: inner() > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 692, > in > >> > inner > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: passthrough_errors, > >> > ssl_context).serve_forever() > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 480, > in > >> > make_server > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: passthrough_errors, > >> > ssl_context) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 410, > in > >> > __init__ > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: > >> > HTTPServer.__init__(self, (host, int(port)), handler) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib64/python2.7/SocketServer.py", line 417, in __init__ > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: self.socket_type) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: File > >> > "/usr/lib64/python2.7/socket.py", line 187, in __init__ > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: _sock = > >> > _realsocket(family, type, proto) > >> > >>> >> > Oct 9 13:58:35 osd03 rbd-target-gw: socket.error: [Errno > >> > 97] Address family not supported by protocol > >> > >>> >> > Oct 9 13:58:35 osd03 systemd: rbd-target-gw.service: > main > >> > process exited, code=exited, status=1/FAILURE > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > On Tue, 9 Oct 2018 at 13:16, Steven Vacaroaia > >> > <[email protected] <mailto:[email protected]>> wrote: > >> > >>> >> >> > >> > >>> >> >> Hi , > >> > >>> >> >> I am using Mimic 13.2 and kernel 4.18 > >> > >>> >> >> Was using gwcli 2.5 and decided to upgrade to latest > (2.7) > >> > as people reported improved performance > >> > >>> >> >> > >> > >>> >> >> What is the proper methodology ? > >> > >>> >> >> How should I troubleshoot this? > >> > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> What I did ( and it broke it) was > >> > >>> >> >> > >> > >>> >> >> cd tcmu-runner; git pull ; make && make install > >> > >>> >> >> cd ceph-iscsi-cli; git pull;python setup.py install > >> > >>> >> >> cd ceph-iscsi-config;git pull; python setup.py install > >> > >>> >> >> cd rtslib-fb;git pull; python setup.py install > >> > >>> >> >> > >> > >>> >> >> After a reboot, I cannot start rbd-target-gw and the > logs > >> > are not very helpful > >> > >>> >> >> ( Note: > >> > >>> >> >> I removed /etc/ceph/iscsi-gateway.cfg and > gateway.conf > >> > object as I wanted to start fresh > >> > >>> >> >> /etc/ceph/iscsi-gatway.conf was left unchanged ) > >> > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> 2018-10-09 12:47:50,593 [ INFO] - Processing osd > >> > blacklist entries for this node > >> > >>> >> >> 2018-10-09 12:47:50,893 [ INFO] - No OSD blacklist > >> > entries found > >> > >>> >> >> 2018-10-09 12:47:50,893 [ INFO] - Reading the > >> > configuration object to update local LIO configuration > >> > >>> >> >> 2018-10-09 12:47:50,893 [ INFO] - Configuration does > >> > not have an entry for this host(osd03) - nothing to define to LIO > >> > >>> >> >> 2018-10-09 12:47:50,893 [ INFO] - Integrated > Prometheus > >> > exporter is enabled > >> > >>> >> >> 2018-10-09 12:47:50,895 [ INFO] - * Running on > >> > http://[::]:9287/ > >> > >>> >> >> 2018-10-09 12:47:50,896 [ INFO] - Removing iSCSI > target > >> > from LIO > >> > >>> >> >> 2018-10-09 12:47:50,896 [ INFO] - Removing LUNs from > LIO > >> > >>> >> >> 2018-10-09 12:47:50,896 [ INFO] - Active Ceph iSCSI > >> > gateway configuration removed > >> > >>> >> >> > >> > >>> >> >> Many thanks > >> > >>> >> >> Steven > >> > >>> >> >> > >> > >>> >> > _______________________________________________ > >> > >>> >> > ceph-users mailing list > >> > >>> >> > [email protected] <mailto: > [email protected]> > >> > >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >>> >> > >> > >>> >> > >> > >>> >> > >> > >>> >> -- > >> > >>> >> Jason > >> > >>> > >> > >>> > >> > >>> > >> > >>> -- > >> > >>> Jason > >> > > >> > > >> > > >> > -- > >> > Jason > >> > > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > [email protected] > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > >> > > > -- > Jason >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
