[ceph-users] Re: iSCSI Gateway reboots and permanent loss

Wesley Dillingham Wed, 04 Dec 2019 12:35:56 -0800

I have never had a permanent loss of a gateway but I'm a believer in
Murphy's law and want to have a plan. Glad to hear that there is a solution
in-the-works, curious when might that be available in a release? If sooner
than later I'll plan to upgrade then immediately, otherwise, if far down
the queue I would like to know if I should ready a standby server.


 Thanks so much for all your great work on this product.


Respectfully,

*Wes Dillingham*
[email protected]
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Dec 4, 2019 at 11:18 AM Mike Christie <[email protected]> wrote:

> On 12/04/2019 08:26 AM, Gesiel Galvão Bernardes wrote:
> > Hi,
> >
> > Em qua., 4 de dez. de 2019 às 00:31, Mike Christie <[email protected]
> > <mailto:[email protected]>> escreveu:
> >
> >     On 12/03/2019 04:19 PM, Wesley Dillingham wrote:
> >     > Thanks. If I am reading this correctly the ability to remove an
> iSCSI
> >     > gateway would allow the remaining iSCSI gateways to take over for
> the
> >     > removed gateway's LUN's as of > 3.0. Thats good, we run 3.2.
> However,
> >     > because the actual update of the central config object happens
> >     from the
> >     > to-be-deleted iSCSI gateway, despite where the gwcli command is
> >     issued,
> >     > it will fail to actually remove said gateway from the object if
> that
> >     > gateway is not functioning.
> >
> >     Yes.
> >
> >     >
> >     > I guess this leaves the question still of how to proceed when one
> >     of the
> >     > iSCSI gateways fails permanently?  Is that possible, or is it
> >     > potentially possible other than manually intervening on the config
> >
> >     You could edit the gateway.cfg manually, but I would not do it,
> because
> >     it's error prone.
> >
> >     It's probably safest to run in degraded mode and wait for an updated
> >     ceph-iscsi package with a fix. If you are running into the problem
> right
> >     now, I can bump the priority.
> >
> > I permanently lost a gateway. I can not leave running "degraded" because
> > I need to add another redundancy gateway, and it does not allow with the
> > gateway "offline".
> >
> > In this case, what can I do? If I create a new gateway with the same
> > name and IP as the lost one, and then try to use "delete" in gwcli, will
> > it work?
>
> Yes.
>
> If you can have a temp stop in services you can also do the following as
> a workaround:
>
> 0. Stop applications accessing iscsi luns, and have the initiator log
> out of the iscsi target.
>
> 1. Stop ceph iscsi service. On all iscsi gw nodes do:
>
> systemctl stop rbd-target-api
>
> 2. Delete gateway.cfg. This will delete the configuration info like the
> target and its ACL and LUN mappings. It does not delete the actual
> images or pools that you have data on.
>
> rados -p rbd rm gateway.cfg
>
> 3. Start ceph iscsi services again. On all iscsi gw nodes do:
>
> systemctl start rbd-target-api
>
> 4. Resetup target with gwcli. For the image/disk setup stage, instead of
> doing the "create" command do the "attach"command:
>
> attach pool=your_pool image=image_name
>
> Then just re-add your target, ACLs and LUN mappings.
>
> 5. On the initiator side relogin to the iscsi target.
>
>
> >
> >
> >
> >     > object? If its not possible would the best course of action be to
> have
> >     > standby hardware and quickly recreate the node or perhaps run the
> >     > gateways more ephemerally, from a VM or container?
> >     >
> >     > Thanks again.
> >     >
> >     > Respectfully,
> >     >
> >     > *Wes Dillingham*
> >     > [email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>
> >     > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >     >
> >     >
> >     > On Tue, Dec 3, 2019 at 2:45 PM Mike Christie <[email protected]
> >     <mailto:[email protected]>
> >     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >     >
> >     >     I do not think it's going to do what you want when the node
> >     you want to
> >     >     delete is down.
> >     >
> >     >     It looks like we only temporarily stop the gw from being
> >     exported. It
> >     >     does not update the gateway.cfg, because we do the config
> >     removal call
> >     >     on the node we want to delete.
> >     >
> >     >     So gwcli would report success and the ls command will show it
> >     as no
> >     >     longer running/exported, but if you restart the rbd-target-api
> >     service
> >     >     then it will show up again.
> >     >
> >     >     There is an internal command to do what you want. I will post
> >     a PR for
> >     >     gwlci and so it can be used by dashboard.
> >     >
> >     >
> >     >     On 12/03/2019 01:19 PM, Jason Dillaman wrote:
> >     >     > If I recall correctly, the recent ceph-iscsi release
> >     supports the
> >     >     > removal of a gateway via the "gwcli". I think the Ceph
> >     dashboard can
> >     >     > do that as well.
> >     >     >
> >     >     > On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham
> >     >     <[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >     >     >>
> >     >     >> We utilize 4 iSCSI gateways in a cluster and have noticed
> the
> >     >     following during patching cycles when we sequentially reboot
> >     single
> >     >     iSCSI-gateways:
> >     >     >>
> >     >     >> "gwcli" often hangs on the still-up iSCSI GWs but sometimes
> >     still
> >     >     functions and gives the message:
> >     >     >>
> >     >     >> "1 gateway is inaccessible - updates will be disabled"
> >     >     >>
> >     >     >> This got me thinking about what the course of action would
> be
> >     >     should an iSCSI gateway fail permanently or semi-permanently,
> >     say a
> >     >     hardware issue. What would be the best course of action to
> >     instruct
> >     >     the remaining iSCSI gateways that one of them is no longer
> >     available
> >     >     and that they should allow updates again and take ownership of
> the
> >     >     now-defunct-node's LUNS?
> >     >     >>
> >     >     >> I'm guessing pulling down the RADOS config object and
> rewriting
> >     >     it and re-put'ing it followed by a rbd-target-api restart
> might do
> >     >     the trick but am hoping there is a more "in-band" and less
> >     >     potentially devastating way to do this.
> >     >     >>
> >     >     >> Thanks for any insights.
> >     >     >>
> >     >     >> Respectfully,
> >     >     >>
> >     >     >> Wes Dillingham
> >     >     >> [email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>
> >     >     >> LinkedIn
> >     >     >> _______________________________________________
> >     >     >> ceph-users mailing list -- [email protected]
> >     <mailto:[email protected]>
> >     >     <mailto:[email protected] <mailto:[email protected]>>
> >     >     >> To unsubscribe send an email to [email protected]
> >     <mailto:[email protected]>
> >     >     <mailto:[email protected]
> >     <mailto:[email protected]>>
> >     >     >
> >     >     >
> >     >     >
> >     >
> >     _______________________________________________
> >     ceph-users mailing list -- [email protected]
> >     <mailto:[email protected]>
> >     To unsubscribe send an email to [email protected]
> >     <mailto:[email protected]>
> >
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: iSCSI Gateway reboots and permanent loss

Reply via email to