Thanks for all the clues. The Supsermicro system is configured with 3 pools: - 1 boot pool on zfs mirror of 2 disks connected to the motherboard SATA - 1 data pool on raidz of 8 disks connected on an Adaptec interface - 1 data pool on raidz of 7 disks connected on an Areca interface Each of the data pool has also a log device: an SSD disk split into two solaris partition, functioning each as log device for 1 data pool. Your question made me think about a possibility: - The only portions of the storage still responding were the NFS share - The NFS share are all on the Areca pool - The CIFS and iScsi Volumes are all on the Adaptec interface Maybe the Adaptec pool had problems? Maybe the Adaptec interface had problems? At the moment I see nothing bad on it. But this may be a possibility. BTW, I understand that HA clustering and the two-heads Supermicro can help, but if the problem was just zfs iScsi software not responding, I don't think the hardware HA would have solved. Don't you think so? Gabriele. ---------------------------------------------------------------------------------- Da: Jim Klimov A: [email protected] Data: 10 novembre 2012 14.08.14 CET Oggetto: Re: [discuss] illumos based ZFS storage failure On 2012-11-10 10:09, Gabriele Bulfon wrote: Hi, the PDC system disk is not on the storage, just a 150GB partition for databases. That's why I can't see how Windows did not let me in even on vmware console. The requirements to have several DCs is a very nice trick from Microsoft to get more licenses... This is quite a normal requirement for highly available infrastructure services. You do likely have DNS replicas, or several equivalent SMTP relays, perhaps multi-mastered LDAP and so on? Do you use clustered databases like PgSQL or MySQL? That it costs extra money for some solutions, is another matter. I heard, but never got to check, that one of SAMBA4's goals was to replace the MS Domain Controllers in a manner compatible to MS AD (and backed by an LDAP service for HA storage of domain data). You might want to take a look at that and get a free solution, if it already works. There is no zfs command running on my .bashrc but, now you opened my eyes : Just before entering the system via ssh, I tried to check the storage via our web interface, and it was correctly responding, until I went to the Pool management, where the web interface issued a "zpool list", and it showed me the available pools. Then I opened the tree to see the filesystem..........and there it stopped responding....... At least I understand why I could not enter the system anymore (not even on console...). Last questions: - shouldn't I find some logs into the svc/logs of the iscsi services? (I don't...) Maybe... unless the disk IOs froze and couldn't save the logs. Are rpool and data pool drives (and pools) separate, or all in one? I wonder now, if SMF can write logs into remote systems, like syslog... - should I rise the swap space? (it's now 4.5GB, phys memory is 8GB). Depends on what your box is doing. If it is mostly ZFS storage with RAM going to ARC cache, likely swap won't help. If it has userspace tasks that may need or require disk-based swap guarantees (notably VirtualBox VMs) - you may need more swap, at least 1:1 with RAM. - what may be the reasons of the pool failing? a zpool status shows it's all fine. I'd bet on software problems - like running out of memory, or bugs in code - but have little proof or testing techniques except trying to recreate the problem while monitoring the various stats closely. Also it may be that some disk in the pool timed out on responses and was not kicked by the SD driver and/or ZFS timeouts... - any other way I can prevent this from happening? HA clustering, shared storage, detect a dead node and STONITH? ;) Perhaps one of those two-motherboards-in-one-rackcase servers from Supermicro (with shared SAS buckets of drives) that Nexenta announced partnering with and recommending a while ago... HTH, //Jim Klimov ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175541-02f10c6f Modify Your Subscription: https://www.listbox.com/member/?&id;secret=21175541-29e3e0ee Powered by Listbox: http://www.listbox.com
------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
