On 06/08/09 19:40, Paul B. Henson wrote:
On Mon, 8 Jun 2009, Eric Schrock wrote:

/usr/lib/fm/libdiskstatus.so.1 and does a
disk_status_open()/disk_status_get()/nvlist_print().

        self-test-failure = (embedded nvlist)
        nvlist version: 0
                result-code = 0xe
                timestamp = 0xea00
                segment = 0x0
                address = 0xea00ea00ea
        (end self-test-failure)

If I'm reading this right, the self test result code is 0xe? Unless the
copy of the spec I found is out of date that's a reserved value and not
currently defined? Which would lead one to believe the fault lies with the
SSD.

My understanding is that the new X4540 SSD is a relabeled X25-E, presumably
it works correctly with fma. Anyone played with one of those yet? I wonder
what changes Sun might have made to the firmware. Doesn't look like there's
any firmware updates out yet for the X25-E from Intel.

selecting the most recent entry correctly.  I'd recommend poking around
with a debugger and seeing why this function believes that the self-test
has failed.

If I'm understanding the output correctly, it probably thinks it failed
because the self-test result is invalid, and I need to either RMA the drive
or go yell at Intel. Although another person I spoke with with an X25-E
says his is reported as failing selftest as well, which would indicate a
general issue with the drive and not a specific failure with my unit.

Yes, that's quite strange. It's also possible that the code to walk the individual log parameters is somehow getting out of sync, and we're walking off into outer space. Certainly the predominance of '0xea' is quite suspicious. I would set a breakpoint in logpage_selftest_analyze() in MDB and do something (after stepping over the pushl/movl to setup the stack frame) like:

        $C
        ... get args to func ...
        <arg1>,<arg2)::dump

Where 'arg0' is the first argument. This will dump out the raw data. From there, we can walk over the parameter entries by hand and see if they look legitimately strange. This may still be a software bug. Someday it would also be nice to rewrite libdiskstatus to leverage libscsi - it would eliminate a large amount of custom code that only makes it more difficult.

- Eric


--
Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock
_______________________________________________
fm-discuss mailing list
fm-discuss@opensolaris.org

Reply via email to