On 5/21/2012 9:04 PM, Matthew Gamble wrote:
> We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port
> Backplane port multipliers (the "backblaze storage pod"). Under intense IO
> (ZFS rebuild, presently) the system will lock up all IO for 3-4 minutes and
> the following entry appears in the dmesg:
>
> siisch11: Timeout on slot 30
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts
> 80192000 serr 00000000
> siisch11: ... waiting for slots 25000000
> siisch11: Timeout on slot 26
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts
> 80192000 serr 00000000
> siisch11: ... waiting for slots 21000000
> siisch11: Timeout on slot 29
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts
> 80192000 serr 00000000
> siisch11: ... waiting for slots 01000000
> siisch11: Timeout on slot 24
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts
> 80192000 serr 00000000
>
> The errors are on different siisch devices so its not likely to be a SATA
> cable issue unless multiple cables all went bad at the same time. On the
> advice of some other posts to the mailing list I've already tried locking the
> SATA rev to one with the following in /boot/loader.conf which didn't
If they are on different siisch devices then yes, it does not sound like
a bad cable. However, I have had that issue with similar errors above
that were fixed by using new cables. If you are using 9.0R, I would
suggest upgrading to stable. There have been a few bug fixes /
improvements to the drivers as well as various parts of the disk
subsystem. I have RELENG8 right now and its quite stable for me on a
25TB system which is for the most part similar to 9.x
# zpool status
pool: zbackup1
state: ONLINE
scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011
config:
NAME STATE READ WRITE CKSUM
zbackup1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada14 ONLINE 0 0 0
ada16 ONLINE 0 0 0
ada13 ONLINE 0 0 0
ada15 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
ada4 ONLINE 0 0 0
ada5 ONLINE 0 0 0
ada6 ONLINE 0 0 0
ada7 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
ada9 ONLINE 0 0 0
ada10 ONLINE 0 0 0
ada11 ONLINE 0 0 0
ada12 ONLINE 0 0 0
errors: No known data errors
# zpool get all zbackup1
NAME PROPERTY VALUE SOURCE
zbackup1 size 25.4T -
zbackup1 capacity 68% -
zbackup1 altroot - default
zbackup1 health ONLINE -
zbackup1 guid 917659042733882722 default
zbackup1 version 28 default
zbackup1 bootfs - default
zbackup1 delegation on default
zbackup1 autoreplace off default
zbackup1 cachefile - default
zbackup1 failmode wait default
zbackup1 listsnapshots on local
zbackup1 autoexpand off default
zbackup1 dedupditto 0 default
zbackup1 dedupratio 1.00x -
zbackup1 free 7.95T -
zbackup1 allocated 17.4T -
zbackup1 readonly off -
zbackup1 comment - default
This is on an adonics adaptor.
---Mike
>
> hint.siisch.0.sata_rev=1
> hint.siisch.1.sata_rev=1
> hint.siisch.2.sata_rev=1
> hint.siisch.3.sata_rev=1
> hint.siisch.4.sata_rev=1
> hint.siisch.5.sata_rev=1
> hint.siisch.6.sata_rev=1
> hint.siisch.7.sata_rev=1
> hint.siisch.8.sata_rev=1
> hint.siisch.9.sata_rev=1
> hint.siisch.10.sata_rev=1
> hint.siisch.11.sata_rev=1
>
> From time to time this is also causing one of the attached drives to go
> offline:
>
> siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts
> 801f2000 serr 00000000
> (ada0:siisch0:0:0:0): lost device
> (ada0:siisch0:0:0:0): removing device entry
> ada0 at siisch0 bus 0 scbus0 target 0 lun 0
> ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
> ada0: Previously was known as ad4
> siisch11: Timeout on slot 30
>
> When the drive goes offline that causes the ZFS rebuild to restart, and so
> it's never finishing the rebuild of the array. Does anyone have any insight
> into what could be causing the timeouts and what we can do to resolve them?
> Right now my priority is to get the system a bit more stable so the current
> ZFS rebuild can complete – right now it's been doing the same rebuild for
> just over 6 days and the timeouts and drive drop offs are causing it to
> restart constantly.
>
>
>
>
>
> ________________________________
>
> This electronic message contains information from Primus Telecommunications
> Canada Inc. ("PRIMUS") , which may be legally privileged and confidential.
> The information is intended to be for the use of the individual(s) or entity
> named above. If you are not the intended recipient, be aware that any
> disclosure, copying, distribution or use of the contents of this information
> is prohibited. If you have received this electronic message in error, please
> notify us by telephone or e-mail (to the number or address above)
> immediately. Any views, opinions or advice expressed in this electronic
> message are not necessarily the views, opinions or advice of PRIMUS. It is
> the responsibility of the recipient to ensure that any attachments are virus
> free and PRIMUS bears no responsibility for any loss or damage arising in any
> way from the use thereof.The term "PRIMUS" includes its affiliates.
>
> ________________________________
> Pour la version en français de ce message, veuillez voir
> http://www.primustel.ca/fr/legal/cs.htm
>
>
>
> _______________________________________________
> [email protected] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[email protected]"
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, [email protected]
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"