Re: [OmniOS-discuss] iscsi timeouts
On 2/3/14, 10:51 AM, Tobias Oetiker wrote: a short update on the matter for anyone browsing the ML archives: The affected system runs on an S2600CP motherboard with RMM4 remote management. RMM comes with the ability to use any of the existing Ethernet ports on the MB for its communication needs ... we have configured it with a separate hw port, but it seems that this ability the access the other ports can interfere with omnios operation. 10 days ago, we have upgraded the bios to version SE5C600.86B.02.01.0002.082220131453 08/22/2013 since then we have not seen any issues ... I am not 100% sure that this is the solution to the problem, as we only found the behaviour after several weeks of uptime ... in any event, for now things look good. Interesting observation, thanks for keeping the list updated! Best wishes, -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
[OmniOS-discuss] iscsi timeouts
Hi, we are serving ISCSI volumes from our omnios box ... in the log on the client I keep seeing this pattern every few hours. any idea what could be causing this ? server and client are directly via a crossover cable over a dedicated interface. Jan 21 01:21:34 iscsi-client kernel: : [1048707.604535] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604656] connection1:0: detected conn error (1011) [kern.info] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604661] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604763] connection2:0: detected conn error (1011) [kern.info] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:57 iscsi-client kernel: : [1048713.496478] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048717.843552] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048718.087086] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.558551] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559623] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559654] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:59 iscsi-client iscsid: connection1:0 is operational after recovery (2 attempts) [daemon.warning] Jan 21 01:21:59 iscsi-client iscsid: connection2:0 is operational after recovery (2 attempts) [daemon.warning] chers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
It looks like your NFS is dropping also ( but then recovering ), so I wouldn't be pinning the problem solely on iscsi. The problem could be anywhere from the network driver all the way back to the switch/cables etc. You'll need to go through each item methodically to find the root cause. On 21/01/2014, at 8:04 PM, Tobias Oetiker wrote: Hi, we are serving ISCSI volumes from our omnios box ... in the log on the client I keep seeing this pattern every few hours. any idea what could be causing this ? server and client are directly via a crossover cable over a dedicated interface. Jan 21 01:21:34 iscsi-client kernel: : [1048707.604535] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604656] connection1:0: detected conn error (1011) [kern.info] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604661] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604763] connection2:0: detected conn error (1011) [kern.info] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:57 iscsi-client kernel: : [1048713.496478] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048717.843552] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048718.087086] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.558551] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559623] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559654] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:59 iscsi-client iscsid: connection1:0 is operational after recovery (2 attempts) [daemon.warning] Jan 21 01:21:59 iscsi-client iscsid: connection2:0 is operational after recovery (2 attempts) [daemon.warning] chers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
We've seen problems like this when we have a SATA drive in a SAS expander that is going out to lunch. Are there any drives showing errors in iostat -En? or any drive timeout messages in the ring buffer? -nld On Tue, Jan 21, 2014 at 3:04 AM, Tobias Oetiker t...@oetiker.ch wrote: Hi, we are serving ISCSI volumes from our omnios box ... in the log on the client I keep seeing this pattern every few hours. any idea what could be causing this ? server and client are directly via a crossover cable over a dedicated interface. Jan 21 01:21:34 iscsi-client kernel: : [1048707.604535] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604656] connection1:0: detected conn error (1011) [kern.info] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604661] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604763] connection2:0: detected conn error (1011) [kern.info] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:57 iscsi-client kernel: : [1048713.496478] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048717.843552] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048718.087086] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.558551] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559623] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559654] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:59 iscsi-client iscsid: connection1:0 is operational after recovery (2 attempts) [daemon.warning] Jan 21 01:21:59 iscsi-client iscsid: connection2:0 is operational after recovery (2 attempts) [daemon.warning] chers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com wrote: We've seen problems like this when we have a SATA drive in a SAS expander that is going out to lunch. Are there any drives showing errors in iostat -En? or any drive timeout messages in the ring buffer? Generally speaking -- you use a SATA drive in a SAS expander at your own risk. I used to be at Nexenta, and they would not support customers who deployed SATA drives on SAS expanders. These days, the price delta between SAS and SATA (for enterprise) is small enough to be worth it for the headaches you avoid. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
Today Dan McDonald wrote: On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com wrote: We've seen problems like this when we have a SATA drive in a SAS expander that is going out to lunch. Are there any drives showing errors in iostat -En? or any drive timeout messages in the ring buffer? Generally speaking -- you use a SATA drive in a SAS expander at your own risk. I used to be at Nexenta, and they would not support customers who deployed SATA drives on SAS expanders. These days, the price delta between SAS and SATA (for enterprise) is small enough to be worth it for the headaches you avoid. we are not using sata NOR a sas expander ... we have a bunch of sas drives directly attached to sas controller ports each ... (the ssd drives are sata but they are directly attached to individual sas ports too) we have several system setup in a similar manner, and the problem only manifests on this one ... but it is also the most busy of the bunch. cheers tobi Dan -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
On Tue, 21 Jan 2014, Tobias Oetiker wrote: Hi, we are serving ISCSI volumes from our omnios box ... in the log on the client I keep seeing this pattern every few hours. any idea what could be causing this ? server and client are directly via a crossover cable over a dedicated interface. It might be a good idea to tell the list what network cards you are using. If they are 1G cards and you are using a Cat 5e cable, do yourself a favor and replace it with a Cat 6 cable. Jan 21 01:21:34 iscsi-client kernel: : [1048707.604535] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604656] connection1:0: detected conn error (1011) [kern.info] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604661] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604763] connection2:0: detected conn error (1011) [kern.info] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:57 iscsi-client kernel: : [1048713.496478] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048717.843552] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048718.087086] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.558551] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559623] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559654] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:59 iscsi-client iscsid: connection1:0 is operational after recovery (2 attempts) [daemon.warning] Jan 21 01:21:59 iscsi-client iscsid: connection2:0 is operational after recovery (2 attempts) [daemon.warning] chers tobi -- Tim RiceMultitalents(707) 456-1146 t...@multitalents.net ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
Hi Tim, Today Tim Rice wrote: On Tue, 21 Jan 2014, Tobias Oetiker wrote: Hi, we are serving ISCSI volumes from our omnios box ... in the log on the client I keep seeing this pattern every few hours. any idea what could be causing this ? server and client are directly via a crossover cable over a dedicated interface. It might be a good idea to tell the list what network cards you are using. If they are 1G cards and you are using a Cat 5e cable, do yourself a favor and replace it with a Cat 6 cable. sure, we have intel on-board controllers 07:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) Subsystem: Intel Corporation Device 3584 Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at d096 (32-bit, non-prefetchable) I/O ports at 1060 Memory at d09b (32-bit, non-prefetchable) Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=10 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [e0] Vital Product Data the cable issue I have to verify ... the odd thing about this behaviour is, that there are several kvm virtual machines running on this box as well, and even when omnios goes 'offline' the kvm hosts (talking over the same physical interface) are still reachable. They themselfs can not talk to omnios either ... cheers tobi Jan 21 01:21:34 iscsi-client kernel: : [1048707.604535] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604656] connection1:0: detected conn error (1011) [kern.info] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604661] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 4557605264, now 4557606516 [kern.err] Jan 21 01:21:34 iscsi-client kernel: : [1048707.604763] connection2:0: detected conn error (1011) [kern.info] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:35 iscsi-client iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) [daemon.warning] Jan 21 01:21:57 iscsi-client kernel: : [1048713.496478] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048717.843552] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048718.087086] nfs: server 10.10.10.1 not responding, still trying [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.558551] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559623] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:57 iscsi-client kernel: : [1048730.559654] nfs: server 10.10.10.1 OK [kern.notice] Jan 21 01:21:59 iscsi-client iscsid: connection1:0 is operational after recovery (2 attempts) [daemon.warning] Jan 21 01:21:59 iscsi-client iscsid: connection2:0 is operational after recovery (2 attempts) [daemon.warning] chers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
Sorry, I should have given the requisite yes, I know that this is a recipe for sadness, for I too have experienced said sadness. That said, we've seen this kind of problem when there was a device in a vdev that was dying a slow death. There wouldn't necessarily be any sign, aside from insanely high service times on an individual device in the pool. From this, I assume that ZFS is still sensitive to variation in underlying drive performance. Tobi, what do your drive service times look like? -nld On Tue, Jan 21, 2014 at 7:58 AM, Dan McDonald dan...@omniti.com wrote: On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com wrote: We've seen problems like this when we have a SATA drive in a SAS expander that is going out to lunch. Are there any drives showing errors in iostat -En? or any drive timeout messages in the ring buffer? Generally speaking -- you use a SATA drive in a SAS expander at your own risk. I used to be at Nexenta, and they would not support customers who deployed SATA drives on SAS expanders. These days, the price delta between SAS and SATA (for enterprise) is small enough to be worth it for the headaches you avoid. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
Hi Nld, Today Narayan Desai wrote: Sorry, I should have given the requisite yes, I know that this is a recipe for sadness, for I too have experienced said sadness. That said, we've seen this kind of problem when there was a device in a vdev that was dying a slow death. There wouldn't necessarily be any sign, aside from insanely high service times on an individual device in the pool. From this, I assume that ZFS is still sensitive to variation in underlying drive performance. Tobi, what do your drive service times look like? -nld the drives seem fine, smart is not reporting anything out of the ordinary and also iostat -En shows 0 on all counts I don't think it is a disk issue, but rather something connected with the network ... On times the machine becomes unreachable for some time, and then it is possible to login via console and all seems well internally. setting the network interface offline and then online again using the dladm tool brings the connectivity back immediatly. waiting helps as well ... since the problem sorts itself out after a few seconds to minutes ... we just had another 'off the net' periode for 30 minutes unfortunately omnios itself does not seem to realize that something is off, at least dmesg does not show any kernel messages about this problem ... we have several systems running on the S2600CP MB ... this is the only one showing problems ... the next thing I intend todo is to upgrade the MB firmware since I found that this box has an older version than the other ones ... System Configuration: Intel Corporation S2600CP BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 11/01/2012 other ideas, most welcome ! cheers tobi On Tue, Jan 21, 2014 at 7:58 AM, Dan McDonald dan...@omniti.com wrote: On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com wrote: We've seen problems like this when we have a SATA drive in a SAS expander that is going out to lunch. Are there any drives showing errors in iostat -En? or any drive timeout messages in the ring buffer? Generally speaking -- you use a SATA drive in a SAS expander at your own risk. I used to be at Nexenta, and they would not support customers who deployed SATA drives on SAS expanders. These days, the price delta between SAS and SATA (for enterprise) is small enough to be worth it for the headaches you avoid. Dan -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
On 1/21/14, 10:09 PM, Saso Kiselkov wrote: On 1/21/14, 10:01 PM, Tobias Oetiker wrote: Hi Nld, Today Narayan Desai wrote: Sorry, I should have given the requisite yes, I know that this is a recipe for sadness, for I too have experienced said sadness. That said, we've seen this kind of problem when there was a device in a vdev that was dying a slow death. There wouldn't necessarily be any sign, aside from insanely high service times on an individual device in the pool. From this, I assume that ZFS is still sensitive to variation in underlying drive performance. Tobi, what do your drive service times look like? -nld the drives seem fine, smart is not reporting anything out of the ordinary and also iostat -En shows 0 on all counts I don't think it is a disk issue, but rather something connected with the network ... On times the machine becomes unreachable for some time, and then it is possible to login via console and all seems well internally. setting the network interface offline and then online again using the dladm tool brings the connectivity back immediatly. waiting helps as well ... since the problem sorts itself out after a few seconds to minutes ... we just had another 'off the net' periode for 30 minutes unfortunately omnios itself does not seem to realize that something is off, at least dmesg does not show any kernel messages about this problem ... we have several systems running on the S2600CP MB ... this is the only one showing problems ... the next thing I intend todo is to upgrade the MB firmware since I found that this box has an older version than the other ones ... System Configuration: Intel Corporation S2600CP BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 11/01/2012 other ideas, most welcome ! You mentioned a couple of e-mails back that you're using Intel I350s. Can you verify that your kernel has: commit 43ae55058ad99c869a9ae39d039490e8a3680520 Author: Dan McDonald dan...@nexenta.com Date: Thu Feb 7 19:27:18 2013 -0500 3534 Disable EEE support in igb for I350 Reviewed by: Robert Mustacchi r...@joyent.com Reviewed by: Jason King jason.brian.k...@gmail.com Reviewed by: Marcel Telka mar...@telka.sk Reviewed by: Sebastien Roy sebastien@delphix.com Approved by: Richard Lowe richl...@richlowe.net I guess you can check for this string at runtime: $ strings /kernel/drv/amd64/igb | grep _eee_support If it is missing, then it could be the buggy EEE support that's throwing your link out of whack here. Nevermind, missed your description of the KVM guests being reachable while only the host goes offline... Did snoop show anything arriving at the host while it is offline? Cheers, -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
On 1/21/14, 10:16 PM, Saso Kiselkov wrote: On 1/21/14, 10:09 PM, Saso Kiselkov wrote: On 1/21/14, 10:01 PM, Tobias Oetiker wrote: Hi Nld, Today Narayan Desai wrote: Sorry, I should have given the requisite yes, I know that this is a recipe for sadness, for I too have experienced said sadness. That said, we've seen this kind of problem when there was a device in a vdev that was dying a slow death. There wouldn't necessarily be any sign, aside from insanely high service times on an individual device in the pool. From this, I assume that ZFS is still sensitive to variation in underlying drive performance. Tobi, what do your drive service times look like? -nld the drives seem fine, smart is not reporting anything out of the ordinary and also iostat -En shows 0 on all counts I don't think it is a disk issue, but rather something connected with the network ... On times the machine becomes unreachable for some time, and then it is possible to login via console and all seems well internally. setting the network interface offline and then online again using the dladm tool brings the connectivity back immediatly. waiting helps as well ... since the problem sorts itself out after a few seconds to minutes ... we just had another 'off the net' periode for 30 minutes unfortunately omnios itself does not seem to realize that something is off, at least dmesg does not show any kernel messages about this problem ... we have several systems running on the S2600CP MB ... this is the only one showing problems ... the next thing I intend todo is to upgrade the MB firmware since I found that this box has an older version than the other ones ... System Configuration: Intel Corporation S2600CP BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 11/01/2012 other ideas, most welcome ! You mentioned a couple of e-mails back that you're using Intel I350s. Can you verify that your kernel has: commit 43ae55058ad99c869a9ae39d039490e8a3680520 Author: Dan McDonald dan...@nexenta.com Date: Thu Feb 7 19:27:18 2013 -0500 3534 Disable EEE support in igb for I350 Reviewed by: Robert Mustacchi r...@joyent.com Reviewed by: Jason King jason.brian.k...@gmail.com Reviewed by: Marcel Telka mar...@telka.sk Reviewed by: Sebastien Roy sebastien@delphix.com Approved by: Richard Lowe richl...@richlowe.net I guess you can check for this string at runtime: $ strings /kernel/drv/amd64/igb | grep _eee_support If it is missing, then it could be the buggy EEE support that's throwing your link out of whack here. Nevermind, missed your description of the KVM guests being reachable while only the host goes offline... Did snoop show anything arriving at the host while it is offline? However, on second thought, you did mention that you're running crossover between two hosts, which would match the description of the EEE issue: https://illumos.org/issues/3534 The energy efficient Ethernet (EEE) support in Intel's I350 GigE NIC drops link on directly-attached link cases. Anyhow, make sure you're running the EEE fix. -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] iscsi timeouts
Today Saso Kiselkov wrote: I guess you can check for this string at runtime: $ strings /kernel/drv/amd64/igb | grep _eee_support If it is missing, then it could be the buggy EEE support that's throwing your link out of whack here. Nevermind, missed your description of the KVM guests being reachable while only the host goes offline... Did snoop show anything arriving at the host while it is offline? However, on second thought, you did mention that you're running crossover between two hosts, which would match the description of the EEE issue: https://illumos.org/issues/3534 The energy efficient Ethernet (EEE) support in Intel's I350 GigE NIC drops link on directly-attached link cases. Anyhow, make sure you're running the EEE fix. I think that is in # strings /kernel/drv/amd64/igb | grep _eee_support _eee_support the issue manifests also on the main interface ... the worst case scenario is, that some odd ethernet packet, present only in the wild-west-network where this box lives is sending the network stack into some sort of a tail-spin ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss