On Nov 12, 2012, at 6:33 AM, Gabriele Bulfon <[email protected]> wrote:
> Going deeper in the problem about the Adaptec interface, I talked with the > hardware guys > and someone told me that using this hardware / raidz may cause zfs to hang: > > SATA Disks Western Digital Raid Edition > Adaptec 3805 (cache disabled). > > I know that SAS disks are always a better solution than SATA. > But this is a cheaper solution, and we opted for SATA. > The hardware guy told me that there is the possibility that beacuse of the > nature of SATA, > when using radiz, zfs may get mad in case of a disk failure, not receiving > correct response from the > controller, zfs commands may hang. > > Is this correct? Not exactly correct. ZFS relies on the underlying device drivers to handle I/O timeouts. Until you get more information, it is not clear if the hang was due to a disk, the controller, or something else. > In case this is correct, it means I had a disk failure in the Adaptec > controller, but having no access > to login (because of what Alasdair kindly noted) I had no way to see it. I/O timeout/retries are recorded in FMA. Use fmdump -e to see the error reports. > Anyway, once the machine reset, I got everything up and running, and the > zpool looks fine. > Should I run a scrub or what to check for disk problems on that controller? I would. -- richard > > Thanx again, > Gabriele. > > > > > > Da: Gabriele Bulfon <[email protected]> > A: [email protected] > Data: 12 novembre 2012 14.43.45 CET > Oggetto: Re: [discuss] illumos based ZFS storage failure > > > Thanks for all the clues. > > The Supsermicro system is configured with 3 pools: > - 1 boot pool on zfs mirror of 2 disks connected to the motherboard SATA > - 1 data pool on raidz of 8 disks connected on an Adaptec interface > - 1 data pool on raidz of 7 disks connected on an Areca interface > > Each of the data pool has also a log device: an SSD disk split into two > solaris partition, functioning > each as log device for 1 data pool. > > Your question made me think about a possibility: > - The only portions of the storage still responding were the NFS share > - The NFS share are all on the Areca pool > - The CIFS and iScsi Volumes are all on the Adaptec interface > > Maybe the Adaptec pool had problems? Maybe the Adaptec interface had problems? > At the moment I see nothing bad on it. But this may be a possibility. > > BTW, I understand that HA clustering and the two-heads Supermicro can help, > but if the > problem was just zfs iScsi software not responding, I don't think the > hardware HA would have solved. > Don't you think so? > > Gabriele. > > > ---------------------------------------------------------------------------------- > > Da: Jim Klimov <[email protected]> > A: [email protected] > Data: 10 novembre 2012 14.08.14 CET > Oggetto: Re: [discuss] illumos based ZFS storage failure > > On 2012-11-10 10:09, Gabriele Bulfon wrote: > > Hi, the PDC system disk is not on the storage, just a 150GB partition > > for databases. > > That's why I can't see how Windows did not let me in even on vmware console. > > > > The requirements to have several DCs is a very nice trick from Microsoft > > to get more licenses... > > This is quite a normal requirement for highly available infrastructure > services. You do likely have DNS replicas, or several equivalent SMTP > relays, perhaps multi-mastered LDAP and so on? Do you use clustered > databases like PgSQL or MySQL? > > That it costs extra money for some solutions, is another matter. > > I heard, but never got to check, that one of SAMBA4's goals was to > replace the MS Domain Controllers in a manner compatible to MS AD > (and backed by an LDAP service for HA storage of domain data). > You might want to take a look at that and get a free solution, if > it already works. > > > > > There is no zfs command running on my .bashrc but, now you opened my eyes : > > > > Just before entering the system via ssh, I tried to check the storage > > via our web interface, > > and it was correctly responding, until I went to the Pool management, > > where the web interface > > issued a "zpool list", and it showed me the available pools. > > Then I opened the tree to see the filesystem..........and there it > > stopped responding....... > > > > At least I understand why I could not enter the system anymore (not even > > on console...). > > > > Last questions: > > - shouldn't I find some logs into the svc/logs of the iscsi services? (I > > don't...) > > Maybe... unless the disk IOs froze and couldn't save the logs. > Are rpool and data pool drives (and pools) separate, or all in one? > > I wonder now, if SMF can write logs into remote systems, like syslog... > > > - should I rise the swap space? (it's now 4.5GB, phys memory is 8GB). > > Depends on what your box is doing. If it is mostly ZFS storage with > RAM going to ARC cache, likely swap won't help. If it has userspace > tasks that may need or require disk-based swap guarantees (notably > VirtualBox VMs) - you may need more swap, at least 1:1 with RAM. > > > - what may be the reasons of the pool failing? a zpool status shows it's > > all fine. > > I'd bet on software problems - like running out of memory, or bugs > in code - but have little proof or testing techniques except trying > to recreate the problem while monitoring the various stats closely. > > Also it may be that some disk in the pool timed out on responses > and was not kicked by the SD driver and/or ZFS timeouts... > > > - any other way I can prevent this from happening? > > HA clustering, shared storage, detect a dead node and STONITH? ;) > > Perhaps one of those two-motherboards-in-one-rackcase servers > from Supermicro (with shared SAS buckets of drives) that Nexenta > announced partnering with and recommending a while ago... > > HTH, > //Jim Klimov > > > ------------------------------------------- > illumos-discuss > Archives: https://www.listbox.com/member/archive/182180/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175541-02f10c6f > Modify Your Subscription: https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com > > > illumos-discuss | Archives | Modify Your Subscription > illumos-discuss | Archives | Modify Your Subscription -- [email protected] +1-760-896-4422 ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
