Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

Lars Ellenberg Tue, 19 Nov 2013 02:50:20 -0800

On Wed, Nov 13, 2013 at 09:02:47AM +0300, Vladislav Bogdanov wrote:
> 13.11.2013 04:46, Jefferson Ogata wrote:
> ...
> > 
> > In practice i ran into failover problems under load almost immediately.
> > Under load, when i would initiate a failover, there was a race
> > condition: the iSCSILogicalUnit RA will take down the LUNs one at a
> > time, waiting for each connection to terminate, and if the initiators
> > reconnect quickly enough, they get pissed off at finding that the target
> > still exists but the LUN they were using no longer does, which is often
> > the case during this transient takedown process. On the initiator, it
> > looks something like this, and it's fatal (here LUN 4 has gone away but
> > the target is still alive, maybe working on disconnecting LUN 3):
> > 
> > Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal
> > Request [current]
> > Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit
> > not supported
> > Nov  7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical
> > block 16542656
> > 
> > One solution to this is using the portblock RA to block all initiator
> 
> In addition I force use of multipath on initiators with no_path_retry=queue
> 
> ...
> 
> > 
> > 1. Lack of support for multiple targets using the same tgt account. This
> > is a problem because the iSCSITarget RA defines the user and the target
> > at the same time. If it allowed multiple targets to use the same user,
> > it wouldn't know when it is safe to delete the user in a stop operation,
> > because some other target might still be using it.
> > 
> > To solve this i did two things: first i wrote a new RA that manages a


Did I miss it, or did you post it somewhere?
Fork on Github and push there, so we can have a look?

> > tgt user; this is instantiated as a clone so it runs along with the tgtd
> > clone. Second i tweaked the iSCSITarget RA so that on start, if
> > incoming_username is defined but incoming_password is not, the RA skips
> > the account creation step and simply binds the new target to
> > incoming_username. On stop, it similarly no longer deletes the account
> > if incoming_password is unset. I also had to relax the uniqueness
> > constraint on incoming_username in the RA metadata.
> > 
> > 2. Disappearing LUNs during failover cause initiators to blow chunks.
> > For this i used portblock, but had to modify it because the TCP Send-Q
> > would never drain.
> > 
> > 3. portblock preventing TCP Send-Q from draining, causing tgtd
> > connections to hang. I modified portblock to reverse the sense of the
> > iptables rules it was adding: instead of blocking traffic from the
> > initiator on the INPUT chain, it now blocks traffic from the target on
> > the OUTPUT chain with a tcp-reset response. With this setup, as soon as
> > portblock goes active, the next packet tgtd attempts to send to a given
> > initiator will get a TCP RST response, causing tgtd to hang up the
> > connection immediately. This configuration allows the connections to
> > terminate promptly under load.
> > 
> > I'm not totally satisfied with this workaround. It means
> > acknowledgements of operations tgtd has actually completed never make it
> > back to the initiator. I suspect this could cause problems in some
> > scenarios. I don't think it causes a problem the way i'm using it, with
> > each LUN as backing store for a distinct VM--when the LUN is back up on
> > the other node, the outstanding operations are re-sent by the initiator.
> > Maybe with a clustered filesystem this would cause problems; it
> > certainly would cause problems if the target device were, for example, a
> > tape drive.

Maybe only block "new" incoming connection attempts?

> > 4. "Insufficient privileges" faults in the portblock RA. This was
> > another race condition that occurred because i was using multiple
> > targets, meaning that without a mutex, multiple portblock invocations
> > would be running in parallel during a failover. If you try to run
> > iptables while another iptables is running, you get "Resource not
> > available" and this was coming back to pacemaker as "insufficient
> > privileges". This is simply a bug in the portblock RA; it should have a
> > mutex to prevent parallel iptables invocations. I fixed this by adding
> > an ocf_release_lock_on_exit at the top, and adding an ocf_take_lock for
> > start, stop, monitor, and status operations.
> >
> > I'm not sure why more people haven't run into these problems before. I
> > hope it's not that i'm doing things wrong, but rather that few others
> > haven't earnestly tried to build anything quite like this setup. If
> > anyone out there has set up a similar cluster and *not* had these
> > problems, i'd like to know about it. Meanwhile, if others *have* had
> > these problems, i'd also like to know, especially if they've found
> > alternate solutions.
> 
> Can't say about 1, I use IET, it doesn't seem to have that limitation.
> 2 - I use alternative home-brew ms RA which blocks (DROP) both input and
> output for a specified VIP on demote (targets are configured to be bound
> to that VIPs). I also export one big LUN per target and then set up clvm
> VG on top of it (all initiators are in the same another cluster).
> 3 - can't say as well, IET is probably not affected.
> 4 - That is true, iptables doesn't have atomic rules management, so you
> definitely need mutex or dispatcher like firewalld (didn't try it though).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

Reply via email to