On Wed, Nov 13, 2013 at 09:02:47AM +0300, Vladislav Bogdanov wrote: > 13.11.2013 04:46, Jefferson Ogata wrote: > ... > > > > In practice i ran into failover problems under load almost immediately. > > Under load, when i would initiate a failover, there was a race > > condition: the iSCSILogicalUnit RA will take down the LUNs one at a > > time, waiting for each connection to terminate, and if the initiators > > reconnect quickly enough, they get pissed off at finding that the target > > still exists but the LUN they were using no longer does, which is often > > the case during this transient takedown process. On the initiator, it > > looks something like this, and it's fatal (here LUN 4 has gone away but > > the target is still alive, maybe working on disconnecting LUN 3): > > > > Nov 7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal > > Request [current] > > Nov 7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit > > not supported > > Nov 7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical > > block 16542656 > > > > One solution to this is using the portblock RA to block all initiator > > In addition I force use of multipath on initiators with no_path_retry=queue > > ... > > > > > 1. Lack of support for multiple targets using the same tgt account. This > > is a problem because the iSCSITarget RA defines the user and the target > > at the same time. If it allowed multiple targets to use the same user, > > it wouldn't know when it is safe to delete the user in a stop operation, > > because some other target might still be using it. > > > > To solve this i did two things: first i wrote a new RA that manages a
Did I miss it, or did you post it somewhere? Fork on Github and push there, so we can have a look? > > tgt user; this is instantiated as a clone so it runs along with the tgtd > > clone. Second i tweaked the iSCSITarget RA so that on start, if > > incoming_username is defined but incoming_password is not, the RA skips > > the account creation step and simply binds the new target to > > incoming_username. On stop, it similarly no longer deletes the account > > if incoming_password is unset. I also had to relax the uniqueness > > constraint on incoming_username in the RA metadata. > > > > 2. Disappearing LUNs during failover cause initiators to blow chunks. > > For this i used portblock, but had to modify it because the TCP Send-Q > > would never drain. > > > > 3. portblock preventing TCP Send-Q from draining, causing tgtd > > connections to hang. I modified portblock to reverse the sense of the > > iptables rules it was adding: instead of blocking traffic from the > > initiator on the INPUT chain, it now blocks traffic from the target on > > the OUTPUT chain with a tcp-reset response. With this setup, as soon as > > portblock goes active, the next packet tgtd attempts to send to a given > > initiator will get a TCP RST response, causing tgtd to hang up the > > connection immediately. This configuration allows the connections to > > terminate promptly under load. > > > > I'm not totally satisfied with this workaround. It means > > acknowledgements of operations tgtd has actually completed never make it > > back to the initiator. I suspect this could cause problems in some > > scenarios. I don't think it causes a problem the way i'm using it, with > > each LUN as backing store for a distinct VM--when the LUN is back up on > > the other node, the outstanding operations are re-sent by the initiator. > > Maybe with a clustered filesystem this would cause problems; it > > certainly would cause problems if the target device were, for example, a > > tape drive. Maybe only block "new" incoming connection attempts? > > 4. "Insufficient privileges" faults in the portblock RA. This was > > another race condition that occurred because i was using multiple > > targets, meaning that without a mutex, multiple portblock invocations > > would be running in parallel during a failover. If you try to run > > iptables while another iptables is running, you get "Resource not > > available" and this was coming back to pacemaker as "insufficient > > privileges". This is simply a bug in the portblock RA; it should have a > > mutex to prevent parallel iptables invocations. I fixed this by adding > > an ocf_release_lock_on_exit at the top, and adding an ocf_take_lock for > > start, stop, monitor, and status operations. > > > > I'm not sure why more people haven't run into these problems before. I > > hope it's not that i'm doing things wrong, but rather that few others > > haven't earnestly tried to build anything quite like this setup. If > > anyone out there has set up a similar cluster and *not* had these > > problems, i'd like to know about it. Meanwhile, if others *have* had > > these problems, i'd also like to know, especially if they've found > > alternate solutions. > > Can't say about 1, I use IET, it doesn't seem to have that limitation. > 2 - I use alternative home-brew ms RA which blocks (DROP) both input and > output for a specified VIP on demote (targets are configured to be bound > to that VIPs). I also export one big LUN per target and then set up clvm > VG on top of it (all initiators are in the same another cluster). > 3 - can't say as well, IET is probably not affected. > 4 - That is true, iptables doesn't have atomic rules management, so you > definitely need mutex or dispatcher like firewalld (didn't try it though). -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems