Re: [discuss] Is COMSTAR unstable?

Richard Elling Fri, 05 Oct 2012 13:55:45 -0700

comment below...

On Oct 3, 2012, at 11:03 AM, Edward Ned Harvey (openindiana) 
<[email protected]> wrote:


>> From: Edward Ned Harvey (lopser) [mailto:[email protected]]
>> Sent: Monday, October 01, 2012 12:18 PM
>> 
>> When creating zpool, don't use the local device name (c4t2d0 or whatever)
>> because that's not available to the other system.
>> Use only the iscsi target device names.
> 
> I am revisiting this issue today.  I've tried everything I can think of to 
> recreate this issue, and haven't been able to do it.  I have certainly 
> encountered some bad behaviors - which I'll expound upon momentarily - but 
> they all seem to be addressable, fixable, logical problems, and none of them 
> result in a supposedly good pool (as reported in zpool status) returning scsi 
> IO errors or halting the system.  The most likely explanation right now, for 
> the bad behavior I saw before, perpetual IO error even after restoring 
> connection, is that I screwed something up in my iscsi config the first time. 
>  In fact, I suspect, the most likely error I made is the one mentioned above. 
>  Only use the iscsi multipath device names, not the local device names.  I 
> learned that the hard way, wrote down this comment, attempted to 
> rebuild/recreate, thought I succeeded, and perhaps didn't.
> 
> In any event, I've rebuilt the servers both from scratch now, and definitely 
> cleaned up any possible errors in my iscsi config that might have previously 
> existed.
> 
> Now, the state of the world is:
> 
> I have 4 disks on each machine.  They are all iscsi targets.  Each machine 
> connects to itself and the other machine, so each machine can see 8 disks (4 
> local, 4 remote.)  Create a pool that mirrors local disks against remote 
> disks.  Only import the pool on one server.  Rest assured, if the one server 
> dies, you could force import using the other system, if necessary.
> 
> Herein lie the new problems:
> 
> If I don't export the pool before rebooting, then either the iscsi target or 
> initiator is shutdown before the filesystems are unmounted.  So the system 
> spews all sorts of error messages while trying to go down, but it eventually 
> succeeds.  It's somewhat important to know if it was the target or initiator 
> that went down first - If it was the target, then only the local disks became 
> inaccessible, but if it was the intiiator, then both the local and remote 
> disks became inaccessible.  I don't know yet.
> 
> Upon reboot, the pool fails to import, so the svc:/system/filesystem/local 
> service fails, and comes up in maintenance mode.  The whole world is a mess, 
> you have to login at physical text console to export the pool, and reboot.  
> But it comes up cleanly the second time.
> 
> These sorts of problems seem like they should be solvable by introducing some 
> service manifest dependencies...  But there's no way to make it a 
> generalization for the distribution as a whole (illumos/openindiana/oracle).  
> It's just something that should be solvable on a case-by-case basis.
> 
> If you are going to be an initiator only, then it makes sense for 
> svc:/network/iscsi/initiator to be required by svc:/system/filesystem/local 
> 
> If you are going to be a target only, then it makes sense for 
> svc:/system/filesystem/local to be required by svc:/network/iscsi/target
> 
> If you are going to be a target & initiator, then you could get yourself into 
> a deadlock situation.  Make the filesystem depend on the initiator, and make 
> the initiator depend on the target, and make the target depend on the 
> filesystem.  Uh-oh.
> 
> But we can break that cycle easy enough in a lot of situations - If you're 
> doing as I'm doing, where the only targets are raw devices (not zvols) then 
> it should be ok to make the filesystem depend on the initiator, which depends 
> on the target, and the target doesn't depend on anything.
> 
> If you're both a target and an initiator, but all of your targets are zvols 
> that you export to other systems (you're not nesting a filesystem in a zvol 
> of your own, are you?) then it's ok to let the target needs filesystem and 
> filesystem needs initiator, but initiator doesn't need anything.
> 
> So in my case, I'm sharing raw disks, I'm going to try and make filessytem 
> needs initiator, initiator needs target, and target doesn't need anything.

Actually, you will discover that SMF is not the right answer. It doesn't know 
anything
about the state of the other machine. Hence, we have HA cluster solutions that 
do
know how to manage pools shared across systems.

NB, when the greenline team was developing SMF, I noted that the SMF state 
machine
is staggeringly similar to the Sun Cluster resource group state machine. NIH 
prevailed :-(
 -- richard

> 
> Haven't tried yet ... Hopefully google will help accelerate me figuring out 
> how to do that.
> 
> 
> 
> -------------------------------------------
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175743-23d1427b
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com

--

[email protected]
+1-760-896-4422






-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] Is COMSTAR unstable?

Reply via email to