Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-16 Thread Pieter de Boer

Hi Jeremy,

SNIP: both old disks were fine

Anyway, if heavy disk/controller load appears to be causing these
problems, you could have power-related issues.  Possibly the combination
of two disks + heavy I/O causes enough power draw that the ICH9 starts
to behave oddly.  Voltages which deviate too much can cause odd things
to happen to hardware.  If you have the time/money, you might try
replacing the PSU in your system to see if there's any improvement; your
BIOS should be able to provide you Hardware Monitoring statistics
(voltages).  Write these down before and after the PSU swap.  You don't
need to go crazy and buy a 1000W PSU or anything, but 450-750W is pretty
normal these days.
As this is a 19 1U box, I'd need to buy a replacement PSU from Dell or 
a reseller. Not too expensive, but I'd like to avoid that.


While looking through the CVSweb of RELENG_8, I found that ATA timeouts 
have been raised in 8 recently. On 
http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting 
and other URLs, like 
http://linux-bsd-sharing.blogspot.com/2009/03/howto-fix-sata-dma-timeout-issues-on.html, 
I found that increasing the timeout might help. So that's what I'll try 
next time it happens again. If that still doesn't work, I can take a 
better look at the voltage levels.


--
Pieter

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi Jeremy,


Lots to say about all of this.


Thanks for your elaborate reply, it was very useful to see smartctl 
output explained a bit :) I still think there's something else in play 
beside disk failure. I've checked one of the drives I replaced earlier, 
but that one doesn't have any of the errors in its SMART output you 
described, although it did drop out of the mirror multiple times during 
its lifetime.



The WD Caviar Black drives have a useful feature called TLER -- it's
disabled by default, for reasons which I don't want to get into here --
which can force the drive to internally give up after X seconds (it's
user-selectable) when dealing with such remapping/errors.  The idea is
to keep the drive from being deemed dead from the OS/controller's point
of view.  I believe Seagate, Hitachi, or Samsung (I forget which) have
this feature as well, but it's not called TLER.
I've read about this feature, but didn't have the time to try to get it 
turned on (iirc you'd need a specific Western Digital DOS-based util or 
something).



If you want to find out the exact LBA that has the problem (there may be
more than one), I can step you through performing a selective LBA scan
using SMART, since this model of disk does support such.  It's easy to
do, easy to understand the results, and can be done while the drive is
in operation (though I would recommend trying to keep disk I/O to a
minimum during this test).  Let me know.
At a certain point in time I had read errors from specific LBA's on ad4. 
Using dd I was able to pinpoint those to single sectors. Overwriting 
those sectors with what was on ad6 made them readable again. What is odd 
is that the 'remapped sector' count of ad4 is 0.


Still I'd like to know how do perform such a scan.

  Finally, your vmstat -i output:



# vmstat -i
interrupt  total   rate
irq23: atapci0 371021299  10423


Good to know there's no IRQ sharing going on, but what does worry me is
the interrupt rate (10K interrupts/second).  That seems *extremely*
high, but it also depends on what kind of disk I/O is happening on this
system -- especially since you have 2 disks attached to the same
controller.
The rate is higher than 1 also at idle. During a gmirror sync from 
ad6 to ad4, it's about 10670.



iostat 1, iostat -x 1, or gstat might come in handy to tell you
what kind of disk I/O is going on.  If actual I/O is very little, then
something weird is going on with regards to the number of interrupts
being seen on IRQ 23.  mav@ might have some ideas, otherwise I'd
recommend rebooting the machine and seeing if the number drops.  If so,
it may be that the OS has some sort of bug where a disk timing out or
falling off the bus causes interrupt problems.  (It's too bad you don't
have AHCI on this system.  It handles stuff like this much more
elegantly...)
If mav@ or anyone else doesn't have another insight in the interrupt 
rate, I guess a reboot will at least show if it's persistent or related 
to the errors. I'll try to do a reboot when convenient (probably sunday 
morning or something).


Thanks,
Pieter




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi Terry,


I have a bunch of R300's here. From one that is using the on-board SATA
and 2 drives in a gmirror setup (very similar to the OP) after 18 hours
of uptime:

[0:2] speedtest:~ vmstat -i
interrupt  total   rate
irq23: atapci0254116  3
Interesting. Which version of FreeBSD is this system running? I guess 
you didn't experience any of the timeouts I'm seeing?



  I also have another R300 with Dell's SAS 6/iR card (a re-branded LSI
1068-something, seen as mpt by FreeBSD). While Dell only sells that as
part of a package deal with the hot-swap backplane and redundant power
supplies, there's no reason you couldn't pick one up on eBay and add it
yourself. You'll need some sort of breakout cable to get from the big
connector on the SAS 6 to individual SATA ports.
Yeah, this R300 was bought second-hand and unfortunately the owner 
pulled the RAID card out. It's something to consider, getting one of 
those cards. Do you use the RAID-features of the drive and if so, does 
that work well? I'm a bit hesitant to use hardware raid; it would be a 
big plus if the RAID disks could also be used stand-alone if need be 
(which is easy with gmirror because of its metadata being stored in the 
drive's last sector).


Thanks,
Pieter

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi there,


what kind of disk I/O is going on.  If actual I/O is very little, then
something weird is going on with regards to the number of interrupts
being seen on IRQ 23.  mav@ might have some ideas, otherwise I'd
recommend rebooting the machine and seeing if the number drops.  If so,
it may be that the OS has some sort of bug where a disk timing out or
falling off the bus causes interrupt problems.  (It's too bad you don't
have AHCI on this system.  It handles stuff like this much more
elegantly...)
Well, due to a UFS snapshot panic the box was rebooted, and now I only 
see around 1500 interrupts per second, while syncing the mirror.


--
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Terry Kennedy

Interesting. Which version of FreeBSD is this system running? I guess
you didn't experience any of the timeouts I'm seeing?


 8-STABLE as of the 11th of this month, or thereabouts. No, I've never
seen a disk timeout on that box.


Yeah, this R300 was bought second-hand and unfortunately the owner
pulled the RAID card out. It's something to consider, getting one of
those cards. Do you use the RAID-features of the drive and if so, does
that work well? I'm a bit hesitant to use hardware raid; it would be a
big plus if the RAID disks could also be used stand-alone if need be
(which is easy with gmirror because of its metadata being stored in the
drive's last sector).


Does your system have hot-swap drive bays and the SAS backplane? If it
at least has hot-swap bays, then you could always add the backplane,
cable, and controller.

 I'm using the hardware mirroring on the SAS 6/iR card (with a pair of
WD3000HLFS drives, since the previous owner took the factory drives out
before selling the system).

 I haven't tried taking one of those drives and seeing if it will boot
on a standalone SATA port. I have removed both drives, installed a scratch
drive, and installed Windows on it to run one of the Dell update install-
ers (not all of them come in DOS or Linux flavors). The controller didn't
mind the swap a bit (or the swap back to the 2 RAID drives). That's a lot
better than the old amr-based RAID cards.

   Terry Kennedy http://www.tmk.com
   te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Miroslav Lachman

Pieter de Boer wrote:

Hi there,


what kind of disk I/O is going on. If actual I/O is very little, then
something weird is going on with regards to the number of interrupts
being seen on IRQ 23. mav@ might have some ideas, otherwise I'd
recommend rebooting the machine and seeing if the number drops. If so,
it may be that the OS has some sort of bug where a disk timing out or
falling off the bus causes interrupt problems. (It's too bad you don't
have AHCI on this system. It handles stuff like this much more
elegantly...)

Well, due to a UFS snapshot panic the box was rebooted, and now I only
see around 1500 interrupts per second, while syncing the mirror.


I seen high interrupts on 7.x systems after pulling out/in one drive in 
gmirror [1] even if it was successfully disconnected by gmirror remove + 
atacontrol detach and reconnected by atacontrol attach + gmirror insert.

It was not 100% reproducible, but it seems the bug is still there in 8.x.

[1] 
http://lists.freebsd.org/pipermail/freebsd-stable/2008-October/046003.html


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Jeremy Chadwick
On Sat, May 15, 2010 at 09:04:11AM +0200, Pieter de Boer wrote:
 Thanks for your elaborate reply, it was very useful to see smartctl
 output explained a bit :) I still think there's something else in
 play beside disk failure. I've checked one of the drives I replaced
 earlier, but that one doesn't have any of the errors in its SMART
 output you described, although it did drop out of the mirror
 multiple times during its lifetime.

That could be caused by a multitude of other known things.  For example,
some Western Digital Green drives (including the Enterprise class
ones) are known to perform head parking/offloading excessively, which
could result in the drive spending more time doing that than actually
serving overall I/O requests.  There are some other reports of Samsung
Spinpoint drives experiencing other issues (I've since forgotten and
would have to dig up the threads).

If you could provide full SMART stats for that drive, it might help.

 The WD Caviar Black drives have a useful feature called TLER -- it's
 disabled by default, for reasons which I don't want to get into here --
 which can force the drive to internally give up after X seconds (it's
 user-selectable) when dealing with such remapping/errors.  The idea is
 to keep the drive from being deemed dead from the OS/controller's point
 of view.  I believe Seagate, Hitachi, or Samsung (I forget which) have
 this feature as well, but it's not called TLER.

 I've read about this feature, but didn't have the time to try to get
 it turned on (iirc you'd need a specific Western Digital DOS-based
 util or something).

Yes, it's a DOS-based utility (like most firmware upgraders these days).
I can provide it if you'd like.  I've been meaning to spend some time
trying to reverse-engineer the binary to figure out what ATA commands it
sends to the disk to toggle/adjust the feature (so that one could do it
in real-time rather than have to boot into DOS).

 If you want to find out the exact LBA that has the problem (there may be
 more than one), I can step you through performing a selective LBA scan
 using SMART, since this model of disk does support such.  It's easy to
 do, easy to understand the results, and can be done while the drive is
 in operation (though I would recommend trying to keep disk I/O to a
 minimum during this test).  Let me know.

 At a certain point in time I had read errors from specific LBA's on
 ad4. Using dd I was able to pinpoint those to single sectors.

This isn't very effective (dd will read large chunks/amounts of data
(read: multiple LBAs) from the underlying disk at once, rather than the
disk itself performing a per-LBA test).  My opinion is that the dd
method should only be used on drives which don't support selective LBA
scanning via SMART.

 Overwriting those sectors with what was on ad6 made them readable
 again. What is odd is that the 'remapped sector' count of ad4 is 0.

What may have happened is that the drive took a while to read certain
LBAs (long enough for the OS/controller to time out), but that internal
drive ECC was used to correct the reads and the sectors therefore *did
not* need to be remapped.  I do see that Attribute 1 on ad4 is non-zero,
which could indicate said situation, but WD doesn't provide Attribute
195 (ECC recovery rate), which could help here.

SMART implementations are usually quite good (particularly in recent WD
drives), but I have seen situations where certain counters are,
erroneously, not being incremented or changed.  I've seen a couple brand
new disks come out of the factory with non-zero values (indicating
someone at the fab forgot to clear them before shipping).  I'd love to
get my hands on a WD utility that zeros out the counters and re-flashes
the drive firmware to rule out any oddities.

It's been proven already that WD will re-uses the same F/W version
number despite some code being changed.  There was a FreeBSD user who
got a F/W fix from WD for the head offloading/parking ordeal (see above,
re: WD GP), and the firmware version between the old and the new were
the same.  Tracking stuff like this down is basically impossible unless
MD5/SHAs of the firmware files can be provided (good luck).

All HD vendors have their own quirks/ordeals right now.  You basically
just have to go with one who works wells for you, then if things start
going downhill, switch to another.  None of them are perfect.

 Still I'd like to know how do perform such a scan.

smartctl -t select,0-max disk

This will start a selective LBA scan from LBA 0 to the end of the disk.
If any error is encountered, the scan stops and the error -- including
the LBA where an error was seen -- is output in the SMART self-test and
SMART selective self-test logs.  You can then write down the LBA, and
then re-run the above command replacing 0 with the LBA+1 where the
error was seen.

Here's an example of what a failed selective scan looks like (taken from
a Hitachi disk I just dealt with at work a few weeks ago, starting at

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi,

SNIP: disk without errors timing out

That could be caused by a multitude of other known things.  For
example, some Western Digital Green drives (including the
Enterprise class ones) are known to perform head parking/offloading
excessively, which could result in the drive spending more time doing
that than actually serving overall I/O requests.  There are some
other reports of Samsung Spinpoint drives experiencing other issues
(I've since forgotten and would have to dig up the threads).



If you could provide full SMART stats for that drive, it might help.

Attached the SMART output of both disks I replaced about a month ago. It
appears I replaced perfectly fine drives with the current disks with
errors ;(  One of the old disks is in a USB-enclosure now, so 'da0'.

SNIP: enabling TLER

Yes, it's a DOS-based utility (like most firmware upgraders these
days). I can provide it if you'd like.  I've been meaning to spend
some time trying to reverse-engineer the binary to figure out what
ATA commands it sends to the disk to toggle/adjust the feature (so
that one could do it in real-time rather than have to boot into DOS).


I'd like to try that tool. Since the old WD disks are now lying around
at home, I have some time to get a DOS boot working to try it out. A
FreeBSD-implementation of the WD tool and possibly other brands would be
really useful indeed.


At a certain point in time I had read errors from specific LBA's on
 ad4. Using dd I was able to pinpoint those to single sectors.


This isn't very effective (dd will read large chunks/amounts of data 
(read: multiple LBAs) from the underlying disk at once, rather than

the disk itself performing a per-LBA test).  My opinion is that the
dd method should only be used on drives which don't support
selective LBA scanning via SMART.

Will dd read multiple LBAs even when using 'bs=512'? The process I used
was reading using bs=8192, then zooming in on the LBA's mentioned in
the errors in dmesg with bs=512 to find the actual LBA.

A selective scan on ad4 did not reveal any errors today: it 'completed 
without error'. On ad6 it's a whole lot slower; at the time of writing 
it's at 2/3.



All HD vendors have their own quirks/ordeals right now.  You
basically just have to go with one who works wells for you, then if
things start going downhill, switch to another.  None of them are
perfect.
I figured as much. What irritates though is that I've had consistent 
problems with 4 disks in this specific system, but not (such) issues 
with any other disk in other systems I've had. I generally replace disks 
when I grow out of them, not because they break down.



What this indicates to me is that if a disk falls off the bus on an
ICH9 controller in Enhanced (non-AHCI) mode, FreeBSD starts seeing an
absurd number of interrupts generated from the ICH9.  My guess is
FreeBSD isn't doing something correctly with the controller when this
happens; maybe certain commands aren't being sent back to the
controller or handling of certain events are being done improperly
when it comes to ICH9 (or possibly earlier ICH revisions too).  This
should be *very* easy to reproduce.


Unfortunately I'm not really in a position to help reproducing this or 
testing possible fixes; downtime is currently very unwelcome. Although 
one of the previous disks indeed fell of the bus entirely (couldn't get 
it back with atacontrol either), that hasn't happened again so far. I 
only see timeouts (and a few days ago read errors on ad4) which gmirror 
doesn't like. I guess those aren't that simple to reproduce (apart from 
on my system ;).



If you see any of your disks on the ICH9 controller fall off the bus
or report ATA errors (doesn't matter what kind), please make note of
the timestamp (should be in the kernel log), and ASAP run smartctl
-a on the disk.  You should compare attributes before and after the
event.
You might also want to consider using smartd, which can log SMART 
attribute changes on its own.  Note that you might have to tune the 
arguments in smartd.conf to ignore some attributes which fluctuate 
naturally (such as drive temperature and seek error rate).


I've configured smartd to poll both disks every 5 minutes. I -think- the 
issues happen specifically under load: the periodic scripts of the host 
and its 4 jails appear to trigger it sometimes. At that time I'm 
normally trying to get some sleep, so smartd will have to do for now. 
Although I'll run a smartctl -a asap anyway.


--
Pieter




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Attached the SMART output of both disks I replaced about a month ago. It
appears I replaced perfectly fine drives with the current disks with
errors ;(  One of the old disks is in a USB-enclosure now, so 'da0'.


Let's send those attachments, then.

--
Pieter
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE i386] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE3 Serial ATA family
Device Model: WDC WD5002ABYS-18B1B0
Serial Number:WD-WMASY5474089
Firmware Version: 02.03B03
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Sat May 15 21:53:04 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (9480) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 112) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.
SCT capabilities:  (0x303f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   200   200   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0027   179   179   021Pre-fail  Always   
-   4033
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   89
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   093   093   000Old_age   Always   
-   5536
 10 Spin_Retry_Count0x0032   100   253   000Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0032   100   253   000Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   74
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always   
-   71
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   89
194 Temperature_Celsius 0x0022   100   094   000Old_age   Always   
-   47
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0032   200   200   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0030   200   200   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x0032   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000Old_age   Offline  
-   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted without error   00%  5487 -
# 2  Extended offlineCompleted without error   00%  

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Jeremy Chadwick
On Sat, May 15, 2010 at 11:16:33PM +0200, Pieter de Boer wrote:
 Attached the SMART output of both disks I replaced about a month ago. It
 appears I replaced perfectly fine drives with the current disks with
 errors ;(  One of the old disks is in a USB-enclosure now, so 'da0'.

Regarding the Western Digital RE3 disk (serial WD-WMASY5474089):

The disk looks fine.  The only thing of interest here is the
temperature, which is extremely high (47C).  If this is the drive which
is located in an (non-fan-cooled) enclosure, that would explain it.
There are no UDMA/CRC errors, so I'm not of the belief that there were
bad cables in use either.  Finally, there's no sign of the disk powering
on/off excessively either.  In summary, I can't explain how this disk
would fall off the bus given its condition.

Regarding the Western Digital RE3 disk (serial WD-WMASY5474727):

Similar to the first RE3 disk; everything here looks great, including
disk temperature.

I do wish the FreeBSD ATA layer would give full diagnostic messages when
encountering these conditions.  The request buffer could be printed, and
the response (error) could also be printed.  SCSI CAM's error output is
what I'd be hoping for (sans SK/ASC/ASCQ, which AFAIK ATA doesn't have).
Yes, I know this is available if you use ahci.ko, but this isn't
available to the OP.

Anyway, if heavy disk/controller load appears to be causing these
problems, you could have power-related issues.  Possibly the combination
of two disks + heavy I/O causes enough power draw that the ICH9 starts
to behave oddly.  Voltages which deviate too much can cause odd things
to happen to hardware.  If you have the time/money, you might try
replacing the PSU in your system to see if there's any improvement; your
BIOS should be able to provide you Hardware Monitoring statistics
(voltages).  Write these down before and after the PSU swap.  You don't
need to go crazy and buy a 1000W PSU or anything, but 450-750W is pretty
normal these days.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Adam Vande More
On Fri, May 14, 2010 at 12:42 PM, Pieter de Boer pie...@os3.nl wrote:

 I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA
 controller on-board (do not have the RAID controller).

 The system has 2 disks in a gmirror setup. Every now and then, probably
 under some load, one of the disks gets read or write timeouts like:
 May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
 May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
 May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5).
 ad4[WRITE(offset=200404975104, length=16384)]
 May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4
 disconnected.


Have you tried replacing/checking the cables?  Does it always happen to ad4?
 Your drive could be dying, try swapping it out and see if the errors
persist.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer

Adam Vande More wrote:


May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5).
ad4[WRITE(offset=200404975104, length=16384)]
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4
disconnected.



Have you tried replacing/checking the cables?  Does it always happen to ad4?
 Your drive could be dying, try swapping it out and see if the errors
persist.

It happens to both drives and to both drives I replaced a month ago with 
these. Didn't replace the cables back then, but they were correctly 
attached and are now. Also it would be odd that both cables are broken 
at the same time.


--
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 07:42:33PM +0200, Pieter de Boer wrote:
 Hi list,
 
 I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9
 SATA controller on-board (do not have the RAID controller).
 
 The system has 2 disks in a gmirror setup. Every now and then,
 probably under some load, one of the disks gets read or write
 timeouts like:
 May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
 May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
 May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed
 (error=5). ad4[WRITE(offset=200404975104, length=16384)]
 May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider
 ad4 disconnected.
 
 or:
 
 May 13 14:41:26 aberdeen kernel: ad6: TIMEOUT - READ_DMA48 retrying
 (1 retry left) LBA=975513887
 
 Sometimes the read/write succeeds after a few retries, but sometimes
 it does not, so geom_mirror throws the disk out of the mirror.
 
 Tonight ad6 was thrown out of the mirror and ad4 then gave actual
 read errors, resulting in a big mess :(
 
 My question: does anyone have experience with FreeBSD on a Dell R300
 or can anyone give me some help in trying to fix the timeouts?

Could you please do the following:

- Provide output from vmstat -i

- Provide output from dmesg | grep -i ata

- Install ports/sysutils/smartmontools (5.40 or later) and provide
  full output from commands smartctl -a /dev/ad4 and smartctl -a
  /dev/ad6

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer



My question: does anyone have experience with FreeBSD on a Dell R300
or can anyone give me some help in trying to fix the timeouts?


Could you please do the following:

- Provide output from vmstat -i

- Provide output from dmesg | grep -i ata

- Install ports/sysutils/smartmontools (5.40 or later) and provide
  full output from commands smartctl -a /dev/ad4 and smartctl -a
  /dev/ad6


The ad4 SMART output is showing errors, as this disk is indeed broken 
now. It wasn't before and it is a replacement of another disk that 
wasn't broken either. Grmbl, I now see reallocated sectors on ad6 as 
well, in the smartctl output. So both disks look wonky; although afaik 
that's not the main issue here.


I've attached the smartctl output as separate files. smartmontools 5.40 
does not appear to exist; I used 5.39.1, the latest port version.


Attached also the vmstat -i and dmesg output.

--
Pieter
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.0-RELEASE-p1 i386] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Black family
Device Model: WDC WD5001AALS-00L3B2
Serial Number:WD-WCASYA964063
Firmware Version: 01.03B01
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Fri May 14 23:01:49 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
was aborted by an interrupting command 
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:  ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline 
data collection: (11160) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 131) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.
SCT capabilities:  (0x3037) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   200   200   051Pre-fail  Always   
-   78
  3 Spin_Up_Time0x0027   184   168   021Pre-fail  Always   
-   3791
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   992
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   099   099   000Old_age   Always   
-   827
 10 Spin_Retry_Count0x0032   100   100   000Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0032   100   100   000Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   990
192 Power-Off_Retract_Count 0x0032   199   199   000Old_age   Always   
-   989
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   992
194 Temperature_Celsius 0x0022   125   109   000Old_age   Always   
-   22
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0032   200   198   000Old_age   Always   
-   0
198 

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 11:09:28PM +0200, Pieter de Boer wrote:
 The ad4 SMART output is showing errors, as this disk is indeed
 broken now. It wasn't before and it is a replacement of another disk
 that wasn't broken either. Grmbl, I now see reallocated sectors on
 ad6 as well, in the smartctl output. So both disks look wonky;
 although afaik that's not the main issue here.

Lots to say about all of this.

Focusing on drive ad4 (Western Digital):

The disk has 1 uncorrected sector (Attribute 198).  This means the drive
tried to remap it and was not successful.  This could have happened any
time during the lifetime of the drive.  There are no pending sector
reallocations (Attribute 197) (meaning there aren't others which are bad
which the drive is waiting to attempt remapping for), and there are no
remapped sectors (Attribute 5).  There have been no successful
reallocation attempts during the drive's lifetime (Attribute 196).

In general, I would say this is acceptable.  If Attribute 198 was
higher, or you had other pending sectors which needed to be remapped,
I'd say replace the disk.

UDMA/CRC error count (Attribute 199) is zero.  That's good -- it means
that most likely cabling issues can be ruled out, since the attribute
tracks the number of communication errors between the controller and the
disk PCB.

Drive temperature looks good, so nothing to worry about there.

The drive itself has detected numerous error conditions in the SMART
error log during its lifetime -- a total of 48, but SMART only lists the
most recent 5.  The drive has been online for a total of 827 hours
(Attribute 9), which we can use to determine how recent the drive
experienced said errors.  Let's examine the first 3:

 Error 48 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
   40 51 00 9d 84 0e e0  Error: UNC at LBA = 0x000e849d = 951453
   c8 00 20 00 84 0e 00 00  00:45:18.204  READ DMA
 
 Error 47 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
   40 51 00 0c 9d 0e e0  Error: UNC at LBA = 0x000e9d0c = 957708
   c8 00 80 00 9b 0e 00 00  00:03:08.605  READ DMA
 
 Error 46 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
   40 51 00 9d 84 0e e0  Error: UNC at LBA = 0x000e849d = 951453
   c8 00 80 80 82 0e 00 00  00:03:05.176  READ DMA

Okay, it's probably safe to assume these are all signs of the
uncorrected sector.  When a drive attempts a LBA remap -- which in this
case it did, but failed -- it can spend quite a bit of time doing that;
in some cases minutes, not seconds.

The drive essentially locks up during this time (from the perspective
of the SATA controller) -- it's literally spending all of its time
trying to read and re-read the LBA/sector in different ways, hoping to
get the data out of it (and/or correct it with ECC) so that it can be
written to a spare block and then internally the bad LBA won't ever be
used again.  What the OS ends up seeing in this situation is disk
timeouts.  This is completely normal.

The WD Caviar Black drives have a useful feature called TLER -- it's
disabled by default, for reasons which I don't want to get into here --
which can force the drive to internally give up after X seconds (it's
user-selectable) when dealing with such remapping/errors.  The idea is
to keep the drive from being deemed dead from the OS/controller's point
of view.  I believe Seagate, Hitachi, or Samsung (I forget which) have
this feature as well, but it's not called TLER.

Anyway, so this is probably the cause of one detachment/timeout you've
seen FreeBSD report.  Let's move on to the 2 remaining errors:

 Error 45 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
   40 51 08 20 47 6c e0  Error: UNC at LBA = 0x006c4720 = 7096096
   c4 ff 08 ff 46 6c 00 00  00:01:09.459  READ MULTIPLE
 
 Error 44 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
   40 51 08 21 8e 67 e0  Error: UNC at LBA = 0x00678e21 = 6786593
   c4 ff 04 3f 2f 00 00 00  00:01:00.724  READ MULTIPLE

These two happened around the same time (10 seconds within one another).
I'm under the impression that these are *probably* the result of the
above uncorrected sector issue, but I'm not 100% certain.  Here's why I
think that:

- The errors occurred within the same hour mark (817) as the previous 3
  errors,
- The errors happened only 2 minutes prior to the preceding 3,
- The drive was in the process of executing READ MULTIPLE (cmd 0xc4),
  which tells the disk to read multiple logical sectors within 1 pass.

The ATA-8 specification states that READ MULTIPLE is a PIO command.  I'm
not sure how/why FreeBSD would be submitting this to a disk unless the
communication protocol had been downgraded from DMA to PIO.

mav@ might have some insights on this, as well as how to decode some of
the SMART error data shown.  It looks like the 48-bit read input block
is written in reverse order (word 5 to word 0).

If you want to find out the exact LBA that has the problem 

RE: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Terry Kennedy
On Fri May 14 22:42:38 UTC 2010, Jeremy Chadwick wrote:
 Finally, your vmstat -i output:

  # vmstat -i
  interrupt  total   rate
  irq23: atapci0 371021299  10423

 Good to know there's no IRQ sharing going on, but what does worry me is
 the interrupt rate (10K interrupts/second).  That seems *extremely*
 high, but it also depends on what kind of disk I/O is happening on this
 system -- especially since you have 2 disks attached to the same
 controller.

I have a bunch of R300's here. From one that is using the on-board SATA
and 2 drives in a gmirror setup (very similar to the OP) after 18 hours
of uptime:

[0:2] speedtest:~ vmstat -i
interrupt  total   rate
irq23: atapci0254116  3

  I haven't specifically done any stress testing on this box, though I did
do a make -j8 buildworld during the initial gmirror synchronization. 8-}

  The drives are a pair of Dell-labeled 160GB SAMSUNG HE161HJ 1AC01121
that shipped with the box.

  I also have another R300 with Dell's SAS 6/iR card (a re-branded LSI
1068-something, seen as mpt by FreeBSD). While Dell only sells that as
part of a package deal with the hot-swap backplane and redundant power
supplies, there's no reason you couldn't pick one up on eBay and add it
yourself. You'll need some sort of breakout cable to get from the big
connector on the SAS 6 to individual SATA ports.

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org