Re: [OmniOS-discuss] iscsi timeouts

2014-02-03 Thread Saso Kiselkov
On 2/3/14, 10:51 AM, Tobias Oetiker wrote:
 a short update on the matter for anyone browsing the ML archives:
 
 The affected system runs on an S2600CP motherboard with RMM4 remote
 management.  RMM comes with the ability to use any of the existing
 Ethernet ports on the MB for its communication needs ...  we have
 configured it with a separate hw port, but it seems that this
 ability the access the other ports can interfere with omnios
 operation.
 
 10 days ago, we have upgraded the bios to
 version SE5C600.86B.02.01.0002.082220131453 08/22/2013
 
 since then we have not seen any issues ...
 
 I am not 100% sure that this is the solution to the problem, as we
 only found the behaviour after several weeks of uptime ...  in any
 event, for now things look good.

Interesting observation, thanks for keeping the list updated!

Best wishes,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Tobias Oetiker
Hi,

we are serving ISCSI volumes from our omnios box ... in the log on
the client I keep seeing this pattern every few hours.

any idea what could be causing this ?

server and client are directly via a crossover cable over a dedicated interface.

Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604535]  connection1:0: ping 
timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 
4557605264, now 4557606516 [kern.err]
Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604656]  connection1:0: 
detected conn error (1011) [kern.info]
Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604661]  connection2:0: ping 
timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 
4557605264, now 4557606516 [kern.err]
Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604763]  connection2:0: 
detected conn error (1011) [kern.info]
Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 1:0 
error (1011) state (3) [daemon.warning]
Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 2:0 
error (1011) state (3) [daemon.warning]
Jan 21 01:21:57 iscsi-client kernel:  : [1048713.496478] nfs: server 10.10.10.1 
not responding, still trying [kern.notice]
Jan 21 01:21:57 iscsi-client kernel:  : [1048717.843552] nfs: server 10.10.10.1 
not responding, still trying [kern.notice]
Jan 21 01:21:57 iscsi-client kernel:  : [1048718.087086] nfs: server 10.10.10.1 
not responding, still trying [kern.notice]
Jan 21 01:21:57 iscsi-client kernel:  : [1048730.558551] nfs: server 10.10.10.1 
OK [kern.notice]
Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559623] nfs: server 10.10.10.1 
OK [kern.notice]
Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559654] nfs: server 10.10.10.1 
OK [kern.notice]
Jan 21 01:21:59 iscsi-client iscsid:  connection1:0 is operational after 
recovery (2 attempts) [daemon.warning]
Jan 21 01:21:59 iscsi-client iscsid:  connection2:0 is operational after 
recovery (2 attempts) [daemon.warning]

chers
tobi


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread David Bomba
It looks like your NFS is dropping also  ( but then recovering ), so I wouldn't 
be pinning the problem solely on iscsi.

The problem could be anywhere from the network driver all the way back to the 
switch/cables etc. You'll need to go through each item methodically to find the 
root cause.

On 21/01/2014, at 8:04 PM, Tobias Oetiker wrote:

 Hi,
 
 we are serving ISCSI volumes from our omnios box ... in the log on
 the client I keep seeing this pattern every few hours.
 
 any idea what could be causing this ?
 
 server and client are directly via a crossover cable over a dedicated 
 interface.
 
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604535]  connection1:0: ping 
 timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 
 4557605264, now 4557606516 [kern.err]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604656]  connection1:0: 
 detected conn error (1011) [kern.info]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604661]  connection2:0: ping 
 timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 
 4557605264, now 4557606516 [kern.err]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604763]  connection2:0: 
 detected conn error (1011) [kern.info]
 Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 1:0 
 error (1011) state (3) [daemon.warning]
 Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 2:0 
 error (1011) state (3) [daemon.warning]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048713.496478] nfs: server 
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048717.843552] nfs: server 
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048718.087086] nfs: server 
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.558551] nfs: server 
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559623] nfs: server 
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559654] nfs: server 
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:59 iscsi-client iscsid:  connection1:0 is operational after 
 recovery (2 attempts) [daemon.warning]
 Jan 21 01:21:59 iscsi-client iscsid:  connection2:0 is operational after 
 recovery (2 attempts) [daemon.warning]
 
 chers
 tobi
 
 
 -- 
 Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
 http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Narayan Desai
We've seen problems like this when we have a SATA drive in a SAS expander
that is going out to lunch. Are there any drives showing errors in iostat
-En? or any drive timeout messages in the ring buffer?
 -nld


On Tue, Jan 21, 2014 at 3:04 AM, Tobias Oetiker t...@oetiker.ch wrote:

 Hi,

 we are serving ISCSI volumes from our omnios box ... in the log on
 the client I keep seeing this pattern every few hours.

 any idea what could be causing this ?

 server and client are directly via a crossover cable over a dedicated
 interface.

 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604535]  connection1:0:
 ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last
 ping 4557605264, now 4557606516 [kern.err]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604656]  connection1:0:
 detected conn error (1011) [kern.info]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604661]  connection2:0:
 ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last
 ping 4557605264, now 4557606516 [kern.err]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604763]  connection2:0:
 detected conn error (1011) [kern.info]
 Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 1:0
 error (1011) state (3) [daemon.warning]
 Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 2:0
 error (1011) state (3) [daemon.warning]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048713.496478] nfs: server
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048717.843552] nfs: server
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048718.087086] nfs: server
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.558551] nfs: server
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559623] nfs: server
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559654] nfs: server
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:59 iscsi-client iscsid:  connection1:0 is operational after
 recovery (2 attempts) [daemon.warning]
 Jan 21 01:21:59 iscsi-client iscsid:  connection2:0 is operational after
 recovery (2 attempts) [daemon.warning]

 chers
 tobi


 --
 Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
 http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Dan McDonald

On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com wrote:

 We've seen problems like this when we have a SATA drive in a SAS expander 
 that is going out to lunch. Are there any drives showing errors in iostat 
 -En? or any drive timeout messages in the ring buffer?

Generally speaking -- you use a SATA drive in a SAS expander at your own risk. 
 I used to be at Nexenta, and they would not support customers who deployed 
SATA drives on SAS expanders.  These days, the price delta between SAS and SATA 
(for enterprise) is small enough to be worth it for the headaches you avoid.

Dan


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Tobias Oetiker
Today Dan McDonald wrote:


 On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com wrote:

  We've seen problems like this when we have a SATA drive in a SAS expander 
  that is going out to lunch. Are there any drives showing errors in iostat 
  -En? or any drive timeout messages in the ring buffer?

 Generally speaking -- you use a SATA drive in a SAS expander at
 your own risk.  I used to be at Nexenta, and they would not
 support customers who deployed SATA drives on SAS expanders.
 These days, the price delta between SAS and SATA (for enterprise)
 is small enough to be worth it for the headaches you avoid.

we are not using sata NOR a sas expander ...

we have a bunch of sas drives directly attached to sas controller
ports each ... (the ssd drives are sata but they are directly
attached to individual sas ports too)

we have several system setup in a similar manner, and the problem
only manifests on this one ... but it is also the most busy of the
bunch.

cheers
tobi

 Dan




-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Tim Rice
On Tue, 21 Jan 2014, Tobias Oetiker wrote:

 Hi,
 
 we are serving ISCSI volumes from our omnios box ... in the log on
 the client I keep seeing this pattern every few hours.
 
 any idea what could be causing this ?
 
 server and client are directly via a crossover cable over a dedicated 
 interface.

It might be a good idea to tell the list what network cards you are using.
If they are 1G cards and you are using a Cat 5e cable, do yourself a
favor and replace it with a Cat 6 cable.

 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604535]  connection1:0: ping 
 timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 
 4557605264, now 4557606516 [kern.err]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604656]  connection1:0: 
 detected conn error (1011) [kern.info]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604661]  connection2:0: ping 
 timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last ping 
 4557605264, now 4557606516 [kern.err]
 Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604763]  connection2:0: 
 detected conn error (1011) [kern.info]
 Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 1:0 
 error (1011) state (3) [daemon.warning]
 Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 2:0 
 error (1011) state (3) [daemon.warning]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048713.496478] nfs: server 
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048717.843552] nfs: server 
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048718.087086] nfs: server 
 10.10.10.1 not responding, still trying [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.558551] nfs: server 
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559623] nfs: server 
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559654] nfs: server 
 10.10.10.1 OK [kern.notice]
 Jan 21 01:21:59 iscsi-client iscsid:  connection1:0 is operational after 
 recovery (2 attempts) [daemon.warning]
 Jan 21 01:21:59 iscsi-client iscsid:  connection2:0 is operational after 
 recovery (2 attempts) [daemon.warning]
 
 chers
 tobi
 

-- 
Tim RiceMultitalents(707) 456-1146
t...@multitalents.net


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Tobias Oetiker
Hi Tim,

Today Tim Rice wrote:

 On Tue, 21 Jan 2014, Tobias Oetiker wrote:

  Hi,
 
  we are serving ISCSI volumes from our omnios box ... in the log on
  the client I keep seeing this pattern every few hours.
 
  any idea what could be causing this ?
 
  server and client are directly via a crossover cable over a dedicated 
  interface.

 It might be a good idea to tell the list what network cards you are using.
 If they are 1G cards and you are using a Cat 5e cable, do yourself a
 favor and replace it with a Cat 6 cable.

sure, we have intel on-board controllers

07:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
Subsystem: Intel Corporation Device 3584
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at d096 (32-bit, non-prefetchable)
I/O ports at 1060
Memory at d09b (32-bit, non-prefetchable)
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [e0] Vital Product Data

the cable issue I have to verify ...

the odd thing about this behaviour is, that there are several kvm
virtual machines running on this box as well, and even when omnios
goes 'offline' the kvm hosts (talking over the same physical
interface) are still reachable. They themselfs can not talk to
omnios either ...

cheers
tobi

  Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604535]  connection1:0: 
  ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last 
  ping 4557605264, now 4557606516 [kern.err]
  Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604656]  connection1:0: 
  detected conn error (1011) [kern.info]
  Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604661]  connection2:0: 
  ping timeout of 5 secs expired, recv timeout 5, last rx 4557604012, last 
  ping 4557605264, now 4557606516 [kern.err]
  Jan 21 01:21:34 iscsi-client kernel:  : [1048707.604763]  connection2:0: 
  detected conn error (1011) [kern.info]
  Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 1:0 
  error (1011) state (3) [daemon.warning]
  Jan 21 01:21:35 iscsi-client iscsid:  Kernel reported iSCSI connection 2:0 
  error (1011) state (3) [daemon.warning]
  Jan 21 01:21:57 iscsi-client kernel:  : [1048713.496478] nfs: server 
  10.10.10.1 not responding, still trying [kern.notice]
  Jan 21 01:21:57 iscsi-client kernel:  : [1048717.843552] nfs: server 
  10.10.10.1 not responding, still trying [kern.notice]
  Jan 21 01:21:57 iscsi-client kernel:  : [1048718.087086] nfs: server 
  10.10.10.1 not responding, still trying [kern.notice]
  Jan 21 01:21:57 iscsi-client kernel:  : [1048730.558551] nfs: server 
  10.10.10.1 OK [kern.notice]
  Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559623] nfs: server 
  10.10.10.1 OK [kern.notice]
  Jan 21 01:21:57 iscsi-client kernel:  : [1048730.559654] nfs: server 
  10.10.10.1 OK [kern.notice]
  Jan 21 01:21:59 iscsi-client iscsid:  connection1:0 is operational after 
  recovery (2 attempts) [daemon.warning]
  Jan 21 01:21:59 iscsi-client iscsid:  connection2:0 is operational after 
  recovery (2 attempts) [daemon.warning]
 
  chers
  tobi
 



-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Narayan Desai
Sorry, I should have given the requisite yes, I know that this is a recipe
for sadness, for I too have experienced said sadness.

That said, we've seen this kind of problem when there was a device in a
vdev that was dying a slow death. There wouldn't necessarily be any sign,
aside from insanely high service times on an individual device in the pool.
From this, I assume that ZFS is still sensitive to variation in underlying
drive performance.

Tobi, what do your drive service times look like?
 -nld


On Tue, Jan 21, 2014 at 7:58 AM, Dan McDonald dan...@omniti.com wrote:


 On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com
 wrote:

  We've seen problems like this when we have a SATA drive in a SAS
 expander that is going out to lunch. Are there any drives showing errors in
 iostat -En? or any drive timeout messages in the ring buffer?

 Generally speaking -- you use a SATA drive in a SAS expander at your own
 risk.  I used to be at Nexenta, and they would not support customers who
 deployed SATA drives on SAS expanders.  These days, the price delta between
 SAS and SATA (for enterprise) is small enough to be worth it for the
 headaches you avoid.

 Dan



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Tobias Oetiker
Hi Nld,

Today Narayan Desai wrote:

 Sorry, I should have given the requisite yes, I know that this is a recipe
 for sadness, for I too have experienced said sadness.

 That said, we've seen this kind of problem when there was a device in a
 vdev that was dying a slow death. There wouldn't necessarily be any sign,
 aside from insanely high service times on an individual device in the pool.
 From this, I assume that ZFS is still sensitive to variation in underlying
 drive performance.

 Tobi, what do your drive service times look like?
  -nld

the drives seem fine, smart is not reporting anything out of the
ordinary and also iostat -En shows 0 on all counts

I don't think it is a disk issue, but rather something connected
with the network ...

On times the machine becomes unreachable for some time, and then it
is possible to login via console and all seems well internally.
setting the network interface offline and then online again using
the dladm tool brings the connectivity back immediatly. waiting
helps as well ... since the problem sorts itself out after a few
seconds to minutes ...

we just had another 'off the net' periode for 30 minutes

unfortunately omnios itself does not seem to realize that something
is off, at least dmesg does not show any kernel messages about this
problem ...

we have several systems running on the S2600CP MB ... this is the
only one showing problems ...

the next thing I intend todo is to upgrade the MB firmware since I
found that this box has an older version than the other ones ...

System Configuration: Intel Corporation S2600CP
BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 11/01/2012

other ideas, most welcome !

cheers
tobi


 On Tue, Jan 21, 2014 at 7:58 AM, Dan McDonald dan...@omniti.com wrote:

 
  On Jan 21, 2014, at 7:21 AM, Narayan Desai narayan.de...@gmail.com
  wrote:
 
   We've seen problems like this when we have a SATA drive in a SAS
  expander that is going out to lunch. Are there any drives showing errors in
  iostat -En? or any drive timeout messages in the ring buffer?
 
  Generally speaking -- you use a SATA drive in a SAS expander at your own
  risk.  I used to be at Nexenta, and they would not support customers who
  deployed SATA drives on SAS expanders.  These days, the price delta between
  SAS and SATA (for enterprise) is small enough to be worth it for the
  headaches you avoid.
 
  Dan
 
 
 


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Saso Kiselkov
On 1/21/14, 10:09 PM, Saso Kiselkov wrote:
 On 1/21/14, 10:01 PM, Tobias Oetiker wrote:
 Hi Nld,

 Today Narayan Desai wrote:

 Sorry, I should have given the requisite yes, I know that this is a recipe
 for sadness, for I too have experienced said sadness.

 That said, we've seen this kind of problem when there was a device in a
 vdev that was dying a slow death. There wouldn't necessarily be any sign,
 aside from insanely high service times on an individual device in the pool.
 From this, I assume that ZFS is still sensitive to variation in underlying
 drive performance.

 Tobi, what do your drive service times look like?
  -nld

 the drives seem fine, smart is not reporting anything out of the
 ordinary and also iostat -En shows 0 on all counts

 I don't think it is a disk issue, but rather something connected
 with the network ...

 On times the machine becomes unreachable for some time, and then it
 is possible to login via console and all seems well internally.
 setting the network interface offline and then online again using
 the dladm tool brings the connectivity back immediatly. waiting
 helps as well ... since the problem sorts itself out after a few
 seconds to minutes ...

 we just had another 'off the net' periode for 30 minutes

 unfortunately omnios itself does not seem to realize that something
 is off, at least dmesg does not show any kernel messages about this
 problem ...

 we have several systems running on the S2600CP MB ... this is the
 only one showing problems ...

 the next thing I intend todo is to upgrade the MB firmware since I
 found that this box has an older version than the other ones ...

 System Configuration: Intel Corporation S2600CP
 BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 
 11/01/2012

 other ideas, most welcome !
 
 You mentioned a couple of e-mails back that you're using Intel I350s.
 Can you verify that your kernel has:
 
 commit 43ae55058ad99c869a9ae39d039490e8a3680520
 Author: Dan McDonald dan...@nexenta.com
 Date:   Thu Feb 7 19:27:18 2013 -0500
 
 3534 Disable EEE support in igb for I350
 Reviewed by: Robert Mustacchi r...@joyent.com
 Reviewed by: Jason King jason.brian.k...@gmail.com
 Reviewed by: Marcel Telka mar...@telka.sk
 Reviewed by: Sebastien Roy sebastien@delphix.com
 Approved by: Richard Lowe richl...@richlowe.net
 
 I guess you can check for this string at runtime:
 $ strings /kernel/drv/amd64/igb | grep _eee_support
 
 If it is missing, then it could be the buggy EEE support that's throwing
 your link out of whack here.

Nevermind, missed your description of the KVM guests being reachable
while only the host goes offline... Did snoop show anything arriving at
the host while it is offline?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Saso Kiselkov
On 1/21/14, 10:16 PM, Saso Kiselkov wrote:
 On 1/21/14, 10:09 PM, Saso Kiselkov wrote:
 On 1/21/14, 10:01 PM, Tobias Oetiker wrote:
 Hi Nld,

 Today Narayan Desai wrote:

 Sorry, I should have given the requisite yes, I know that this is a recipe
 for sadness, for I too have experienced said sadness.

 That said, we've seen this kind of problem when there was a device in a
 vdev that was dying a slow death. There wouldn't necessarily be any sign,
 aside from insanely high service times on an individual device in the pool.
 From this, I assume that ZFS is still sensitive to variation in underlying
 drive performance.

 Tobi, what do your drive service times look like?
  -nld

 the drives seem fine, smart is not reporting anything out of the
 ordinary and also iostat -En shows 0 on all counts

 I don't think it is a disk issue, but rather something connected
 with the network ...

 On times the machine becomes unreachable for some time, and then it
 is possible to login via console and all seems well internally.
 setting the network interface offline and then online again using
 the dladm tool brings the connectivity back immediatly. waiting
 helps as well ... since the problem sorts itself out after a few
 seconds to minutes ...

 we just had another 'off the net' periode for 30 minutes

 unfortunately omnios itself does not seem to realize that something
 is off, at least dmesg does not show any kernel messages about this
 problem ...

 we have several systems running on the S2600CP MB ... this is the
 only one showing problems ...

 the next thing I intend todo is to upgrade the MB firmware since I
 found that this box has an older version than the other ones ...

 System Configuration: Intel Corporation S2600CP
 BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 
 11/01/2012

 other ideas, most welcome !

 You mentioned a couple of e-mails back that you're using Intel I350s.
 Can you verify that your kernel has:

 commit 43ae55058ad99c869a9ae39d039490e8a3680520
 Author: Dan McDonald dan...@nexenta.com
 Date:   Thu Feb 7 19:27:18 2013 -0500

 3534 Disable EEE support in igb for I350
 Reviewed by: Robert Mustacchi r...@joyent.com
 Reviewed by: Jason King jason.brian.k...@gmail.com
 Reviewed by: Marcel Telka mar...@telka.sk
 Reviewed by: Sebastien Roy sebastien@delphix.com
 Approved by: Richard Lowe richl...@richlowe.net

 I guess you can check for this string at runtime:
 $ strings /kernel/drv/amd64/igb | grep _eee_support

 If it is missing, then it could be the buggy EEE support that's throwing
 your link out of whack here.
 
 Nevermind, missed your description of the KVM guests being reachable
 while only the host goes offline... Did snoop show anything arriving at
 the host while it is offline?

However, on second thought, you did mention that you're running
crossover between two hosts, which would match the description of the
EEE issue:

https://illumos.org/issues/3534
The energy efficient Ethernet (EEE) support in Intel's I350 GigE NIC
drops link on directly-attached link cases.

Anyhow, make sure you're running the EEE fix.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Tobias Oetiker
Today Saso Kiselkov wrote:

  I guess you can check for this string at runtime:
  $ strings /kernel/drv/amd64/igb | grep _eee_support
 
  If it is missing, then it could be the buggy EEE support that's throwing
  your link out of whack here.
 
  Nevermind, missed your description of the KVM guests being reachable
  while only the host goes offline... Did snoop show anything arriving at
  the host while it is offline?

 However, on second thought, you did mention that you're running
 crossover between two hosts, which would match the description of the
 EEE issue:

 https://illumos.org/issues/3534
 The energy efficient Ethernet (EEE) support in Intel's I350 GigE NIC
 drops link on directly-attached link cases.

 Anyhow, make sure you're running the EEE fix.


I think that is in

# strings /kernel/drv/amd64/igb | grep _eee_support
_eee_support

the issue manifests also on the main interface ...

the worst case scenario is, that some odd ethernet packet, present only
in the wild-west-network where this box lives is sending the
network stack into some sort of a tail-spin ...

cheers
tobi
-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch t...@oetiker.ch ++41 62 775 9902 / sb: -9900
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss