On Nov 12, 2012, at 6:33 AM, Gabriele Bulfon <[email protected]> wrote:

> Going deeper in the problem about the Adaptec interface, I talked with the 
> hardware guys
> and someone told me that using this hardware / raidz may cause zfs to hang:
> 
> SATA Disks Western Digital Raid Edition
> Adaptec 3805 (cache disabled).
> 
> I know that SAS disks are always a better solution than SATA.
> But this is a cheaper solution, and we opted for SATA.
> The hardware guy told me that there is the possibility that beacuse of the 
> nature of SATA,
> when using radiz, zfs may get mad in case of a disk failure, not receiving 
> correct response from the
> controller, zfs commands may hang.
> 
> Is this correct?

Not exactly correct. ZFS relies on the underlying device drivers to handle
I/O timeouts. Until you get more information, it is not clear if the hang was
due to a disk, the controller, or something else.

> In case this is correct, it means I had a disk failure in the Adaptec 
> controller, but having no access
> to login (because of what Alasdair kindly noted) I had no way to see it.

I/O timeout/retries are recorded in FMA. Use fmdump -e to see the error reports.

> Anyway, once the machine reset, I got everything up and running, and the 
> zpool looks fine.
> Should I run a scrub or what to check for disk problems on that controller?

I would.
 -- richard

> 
> Thanx again,
> Gabriele.
> 
> 
> 
> 
> 
> Da: Gabriele Bulfon <[email protected]>
> A: [email protected] 
> Data: 12 novembre 2012 14.43.45 CET
> Oggetto: Re: [discuss] illumos based ZFS storage failure
> 
> 
> Thanks for all the clues.
> 
> The Supsermicro system is configured with 3 pools:
> - 1 boot pool on zfs mirror of 2 disks connected to the motherboard SATA
> - 1 data pool on raidz of 8 disks connected on an Adaptec interface
> - 1 data pool on raidz of 7 disks connected on an Areca interface
> 
> Each of the data pool has also a log device: an SSD disk split into two 
> solaris partition, functioning
> each as log device for 1 data pool.
> 
> Your question made me think about a possibility:
> - The only portions of the storage still responding were the NFS share
> - The NFS share are all on the Areca pool
> - The CIFS and iScsi Volumes are all on the Adaptec interface
> 
> Maybe the Adaptec pool had problems? Maybe the Adaptec interface had problems?
> At the moment I see nothing bad on it. But this may be a possibility.
> 
> BTW, I understand that HA clustering and the two-heads Supermicro can help, 
> but if the
> problem was just zfs iScsi software not responding, I don't think the 
> hardware HA would have solved.
> Don't you think so?
> 
> Gabriele.
> 
> 
> ----------------------------------------------------------------------------------
> 
> Da: Jim Klimov <[email protected]>
> A: [email protected] 
> Data: 10 novembre 2012 14.08.14 CET
> Oggetto: Re: [discuss] illumos based ZFS storage failure
> 
> On 2012-11-10 10:09, Gabriele Bulfon wrote:
> > Hi, the PDC system disk is not on the storage, just a 150GB partition
> > for databases.
> > That's why I can't see how Windows did not let me in even on vmware console.
> >
> > The requirements to have several DCs is a very nice trick from Microsoft
> > to get more licenses...
> 
> This is quite a normal requirement for highly available infrastructure
> services. You do likely have DNS replicas, or several equivalent SMTP
> relays, perhaps multi-mastered LDAP and so on? Do you use clustered
> databases like PgSQL or MySQL?
> 
> That it costs extra money for some solutions, is another matter.
> 
> I heard, but never got to check, that one of SAMBA4's goals was to
> replace the MS Domain Controllers in a manner compatible to MS AD
> (and backed by an LDAP service for HA storage of domain data).
> You might want to take a look at that and get a free solution, if
> it already works.
> 
> >
> > There is no zfs command running on my .bashrc but, now you opened my eyes :
> >
> > Just before entering the system via ssh, I tried to check the storage
> > via our web interface,
> > and it was correctly responding, until I went to the Pool management,
> > where the web interface
> > issued a "zpool list", and it showed me the available pools.
> > Then I opened the tree to see the filesystem..........and there it
> > stopped responding.......
> >
> > At least I understand why I could not enter the system anymore (not even
> > on console...).
> >
> > Last questions:
> > - shouldn't I find some logs into the svc/logs of the iscsi services? (I
> > don't...)
> 
> Maybe... unless the disk IOs froze and couldn't save the logs.
> Are rpool and data pool drives (and pools) separate, or all in one?
> 
> I wonder now, if SMF can write logs into remote systems, like syslog...
> 
> > - should I rise the swap space? (it's now 4.5GB, phys memory is 8GB).
> 
> Depends on what your box is doing. If it is mostly ZFS storage with
> RAM going to ARC cache, likely swap won't help. If it has userspace
> tasks that may need or require disk-based swap guarantees (notably
> VirtualBox VMs) - you may need more swap, at least 1:1 with RAM.
> 
> > - what may be the reasons of the pool failing? a zpool status shows it's
> > all fine.
> 
> I'd bet on software problems - like running out of memory, or bugs
> in code - but have little proof or testing techniques except trying
> to recreate the problem while monitoring the various stats closely.
> 
> Also it may be that some disk in the pool timed out on responses
> and was not kicked by the SD driver and/or ZFS timeouts...
> 
> > - any other way I can prevent this from happening?
> 
> HA clustering, shared storage, detect a dead node and STONITH? ;)
> 
> Perhaps one of those two-motherboards-in-one-rackcase servers
> from Supermicro (with shared SAS buckets of drives) that Nexenta
> announced partnering with and recommending a while ago...
> 
> HTH,
> //Jim Klimov
> 
> 
> -------------------------------------------
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175541-02f10c6f
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
> 
> 
> illumos-discuss | Archives  | Modify Your Subscription         
> illumos-discuss | Archives  | Modify Your Subscription         

--

[email protected]
+1-760-896-4422






-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to