19.11.2013 13:48, Lars Ellenberg wrote: > On Wed, Nov 13, 2013 at 09:02:47AM +0300, Vladislav Bogdanov wrote: >> 13.11.2013 04:46, Jefferson Ogata wrote: >> ... >>> >>> In practice i ran into failover problems under load almost immediately. >>> Under load, when i would initiate a failover, there was a race >>> condition: the iSCSILogicalUnit RA will take down the LUNs one at a >>> time, waiting for each connection to terminate, and if the initiators >>> reconnect quickly enough, they get pissed off at finding that the target >>> still exists but the LUN they were using no longer does, which is often >>> the case during this transient takedown process. On the initiator, it >>> looks something like this, and it's fatal (here LUN 4 has gone away but >>> the target is still alive, maybe working on disconnecting LUN 3): >>> >>> Nov 7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal >>> Request [current] >>> Nov 7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit >>> not supported >>> Nov 7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical >>> block 16542656 >>> >>> One solution to this is using the portblock RA to block all initiator >> >> In addition I force use of multipath on initiators with no_path_retry=queue >> >> ... >> >>> >>> 1. Lack of support for multiple targets using the same tgt account. This >>> is a problem because the iSCSITarget RA defines the user and the target >>> at the same time. If it allowed multiple targets to use the same user, >>> it wouldn't know when it is safe to delete the user in a stop operation, >>> because some other target might still be using it. >>> >>> To solve this i did two things: first i wrote a new RA that manages a > > Did I miss it, or did you post it somewhere? > Fork on Github and push there, so we can have a look? > >>> tgt user; this is instantiated as a clone so it runs along with the tgtd >>> clone. Second i tweaked the iSCSITarget RA so that on start, if >>> incoming_username is defined but incoming_password is not, the RA skips >>> the account creation step and simply binds the new target to >>> incoming_username. On stop, it similarly no longer deletes the account >>> if incoming_password is unset. I also had to relax the uniqueness >>> constraint on incoming_username in the RA metadata. >>> >>> 2. Disappearing LUNs during failover cause initiators to blow chunks. >>> For this i used portblock, but had to modify it because the TCP Send-Q >>> would never drain. >>> >>> 3. portblock preventing TCP Send-Q from draining, causing tgtd >>> connections to hang. I modified portblock to reverse the sense of the >>> iptables rules it was adding: instead of blocking traffic from the >>> initiator on the INPUT chain, it now blocks traffic from the target on >>> the OUTPUT chain with a tcp-reset response. With this setup, as soon as >>> portblock goes active, the next packet tgtd attempts to send to a given >>> initiator will get a TCP RST response, causing tgtd to hang up the >>> connection immediately. This configuration allows the connections to >>> terminate promptly under load. >>> >>> I'm not totally satisfied with this workaround. It means >>> acknowledgements of operations tgtd has actually completed never make it >>> back to the initiator. I suspect this could cause problems in some >>> scenarios. I don't think it causes a problem the way i'm using it, with >>> each LUN as backing store for a distinct VM--when the LUN is back up on >>> the other node, the outstanding operations are re-sent by the initiator. >>> Maybe with a clustered filesystem this would cause problems; it >>> certainly would cause problems if the target device were, for example, a >>> tape drive. > > Maybe only block "new" incoming connection attempts? >
That may cause issues on an initiator side in some circumstances (IIRC): * connection is established * pacemaker fires target move * target is destroyed, connection breaks (TCP RST is sent to initiator) * initiator connects again * target is not available on iSCSI level (but portals answer either on old or on new node) or portals are not available * initiator *returns error* to an upper layer <- this one is important * target is configured on other node then I was hit by this, but that was several years ago, so I may miss some details. My experience with IET and LIO shows it is better (safer) to block all iSCSI traffic to target's portals, both directions. * connection is established * pacemaker fires target move * both directions are blocked (DROP) on both target nodes * target is destroyed, connection stays "established" on initiator side, just TCP packets timeout * target is configured on other node (VIPs are moved too) * firewall rules are removed * initiator (re)sends request * target sends RST (?) back - it doesn't have that connection * initiator reconnects and continues to use target _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
