Hmm, I rebooted this server for the first time since I was testing the SSD,
and it marked the SSD faulty again :( --

r...@ike ~ # fmadm faulty
--------------- ------------------------------------  --------------
---------
TIME            EVENT-ID                              MSG-ID
SEVERITY
--------------- ------------------------------------  --------------
---------
Aug 19 19:46:15 091fd12e-0e26-49c4-87df-85e6b46d78fd  DISK-8000-2J
Critical

Fault class : fault.io.disk.self-test-failure
Affects     :
dev:///:devid=id1,s...@sata_____ssdsa2sh032g1gn___cvem902600j6032hgn//p...@2,0/pci1022,7...@8/pci11ab,1...@1/d...@0,0
                  faulted but still in service
FRU         : "HD_ID_4"
(hc://:product-id=Sun-Fire-X4500:chassis-id=0819AMT059:server-id=ike:serial=CVEM902600J6032HGN:part=SSDSA2SH032G1GN-INTEL:revision=045C8626/bay=4/disk=0)
                  faulty

I'm going to mark it as repaired and see if it gets marked faulty again.
I never heard back from you as to a possible resolution to this? Any
progress?

Thanks...


On Tue, 28 Jul 2009, Paul B. Henson wrote:

> Just wondering if you've made any further progress on handling the buggy
> Intel firmware. So far I haven't had any further fma issues with the SSD,
> but on general principle it would be nice if everything worked the way
> it's supposed to :). Until then, I'll be sure to seed the self test log
> before putting a new SSD in...
>
> On Thu, 18 Jun 2009, Paul B. Henson wrote:
>
> > On Thu, 18 Jun 2009, Eric Schrock wrote:
> >
> > > totally invalid data in response to the ATA READ EXT LOG command for log
> > >   0x07 (Extended SMART self-test log).  The spec defines that byte 0
> > > must be 0x1 and that byte 1 is reserved.
> > >
> > > You can see this from your previous smartctl output from Linux:
> >
> > Yes, I had noticed that.
> >
> > > This is apparently causing us to trip up in strange ways.  I don't know
> > > how the hardware SATL translation is not getting tripped up.  Some more
> > > investigation is necessary, but it's clear the firmware on this drive is
> > > quite broken.
> >
> > You don't happen to have a good contact at Intel I could complain to :)? I
> > somehow think my chances if I cold call their support line with this issue
> > are pretty slim to none :(.
> >
> > smartctl evidently works around this issue, in fact, on reviewing the
> > documentation, it looks like a *lot* of drives aren't exactly spec
> > compliant and there are numerous workarounds to try and do the right thing.
> > Is this something you think you would work around in Solaris code, or would
> > end resolution require Intel to fix their buggy firmware?
> >
> > Fortunately, after initiating the self tests under Linux, the incorrect
> > data being returned no longer causes a fault. And since nothing is
> > initiating self tests under Solaris, you don't really lose anything from
> > invalid self test results.
> >
> > Thanks again, and let me know if you need anything else.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
_______________________________________________
fm-discuss mailing list
fm-discuss@opensolaris.org

Reply via email to