Re: SMART threshold exceeded

2005-03-27 Thread Mike Tancsa
On Sat, 26 Mar 2005 21:19:31 -0800, in sentex.lists.freebsd.hardware
you wrote:

i suspect that this

twe0: AEN: twe0: port 0: SMART threshold exceeded

is not a good sign.  googling tells me it's a failure, but
not whether it is soft or hard.


The disk on Port 0 in this case.   The smartmontools can speak to
disks behind a 3ware both on RELENG_4 and RELENG_5 to get more info.

smartctl -a -d 3ware,0 /dev/twed1
or to talk to the disk on port 1
smartctl -a -d 3ware,1 /dev/twed1

eg

backup2# smartctl -a -d 3ware,0 /dev/twed1
smartctl version 5.33 [i386-unknown-freebsd4.9] Copyright (C) 2002-4
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST380011A
Serial Number:3JV3WT64
Firmware Version: 3.16
User Capacity:80,000,000,000 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:Sun Mar 27 14:03:21 2005 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection
activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status:  (   0) The previous self-test routine
completed
without error or no self-test
has ever 
been run.
Total time to complete Offline 
data collection: ( 430) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline
immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection
upon new
command.
Offline surface scan
supported.
Self-test supported.
No Conveyance Self-test
supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before
entering
power-saving mode.
Supports SMART auto save
timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging
support.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(  58) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   063   060   006Pre-fail
Always   -   15260897
  3 Spin_Up_Time0x0003   098   097   000Pre-fail
Always   -   0
  4 Start_Stop_Count0x0032   100   100   020Old_age
Always   -   0
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail
Always   -   0
  7 Seek_Error_Rate 0x000f   080   060   030Pre-fail
Always   -   116204100
  9 Power_On_Hours  0x0032   089   089   000Old_age
Always   -   10357
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail
Always   -   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age
Always   -   30
194 Temperature_Celsius 0x0022   042   049   000Old_age
Always   -   42
195 Hardware_ECC_Recovered  0x001a   063   060   000Old_age
Always   -   15260897
197 Current_Pending_Sector  0x0012   100   100   000Old_age
Always   -   0
198 Offline_Uncorrectable   0x0010   100   100   000Old_age
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age
Always   -   0
200 Multi_Zone_Error_Rate   0x   100   253   000Old_age
Offline  -   0
202 TA_Increase_Count   0x0032   100   253   000Old_age
Always   -   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_DescriptionStatus  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline   Completed without error   00%  9451
-
# 2  Short offline   Completed without error   00%  9427
-
# 3  Short offline   Completed without error   00%  9404
-
# 4  Short offline   Completed without error   00%  9380
-
# 5  Short offline   Completed without error   00%  9357
-
# 6  Short offline   Completed without error   00%  9333
-
# 7  Short offline   Completed without 

Re: SMART threshold exceeded

2005-03-27 Thread Randy Bush
 twe0: AEN: twe0: port 0: SMART threshold exceeded
 It's the disk connected to port 0 of twe0. (As you might
 have found out already ...)

# ./tw_cli
3ware CLI alarms

Alarms Report for Controller /c0
Date   Severity  Alarm Message
-
N/AN/A   INFO: Soft reset occurred
N/AN/A   WARNING: Sector repair occurred: Port #0
N/AN/A   WARNING: Sector repair occurred: Port #0
N/AN/A   WARNING: Sector repair occurred: Port #0
N/AN/A   WARNING: SMART threshold exceeded: Port #0

which gives me the clue

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: SMART threshold exceeded

2005-03-27 Thread Mike Tancsa
At 08:55 PM 27/03/2005, Randy Bush wrote:
 The disk on Port 0 in this case.   The smartmontools can speak to
 disks behind a 3ware both on RELENG_4 and RELENG_5 to get more info.
but maybe not on 6-current?
The driver is the same.   Perhaps try version 5.33 from the cvs on 
sourceforge?
---Mike 

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to [EMAIL PROTECTED]


Wierd hardware instability

2005-03-27 Thread Peter Jeremy
I've been experimenting with using a 13MHz oven crystal oscillator to
replace the 14.318MHz master reference in an old Asus P5A-B
motherboard.  (The board uses an ICS8148-53 clock generator).  If I
set TIMER_FREQ=1083342 and overclock by 10% then the CPU is running
very close to nominal.

I'm running 5.3p5 and using make buildworld as a stress test and
have found some fairly wierd behaviour: If I let the system boot
normally from power-on then it is unstable - the buildworld will crash
with internal compiler errors and I eventually wind up with a panic.
If I manually reset the system then it becomes rock solid - it has
been doing make -j 4 buildworld in a loop for about 4 days without
a problem but as soon as I do a power-on restart, it becomes unstable
again.  The system was reliable before I started, so it's presumably
something I've done but I can't see why a power-on reset should have
different behaviour to pressing the reset button.  Does anyone have
any suggestions as to the cause.

If anyone wants more detail on what I've done, feel free to ask - I
will probably post details at some stage.

-- 
Peter Jeremy
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to [EMAIL PROTECTED]