Package: libatasmart4 Version: 0.17+git20100219-1 Severity: important Tags: upstream patch
libatasmart incorrectly interprets SMART data in a way that causes it to report healthy hard drives as failing. There are at least two specific problems centering on reallocated sectors. First problem: It assumes that the raw value is exactly equal to the number of reallocated sectors. According to the man page from smartctl: The conversion from Raw value to a quantity with physical units is not specified by the SMART standard. In most cases, the values printed by smartctl are sensible. For example the temperature Attribute generally has its raw value equal to the temperature in Celsius. However in some cases vendors use unusual conventions. For example the Hitachi disk on my laptop reports its power-on hours in minutes, not hours. Some IBM disks track three temperatures rather than one, in their raw values. And so on. Examples can be found of drives that report a raw value that is almost certainly not the real number of reallocated sectors. Some solid state drives like the Samsung MCCOE64GEMPP have a raw value of over 2 million out of the box. The Hitachi HTS541010G9SA00 seems to have similarly large raw values. Second problem: It uses arbitrarily selected thresholds for reallocated sectors to determine that a drive is at risk, even when the SMART data clearly indicates that the threshold selected by the manufacturer is far from being reached. There are two of these arbitrary thresholds. First, if the apparent number of reallocated sectors exceeds log2 of the disk size, it reports it as failing with "many" bad sectors. Second, if the apparent number of reallocated sectors or pending reallocations is greater than 0, it reports the disk as failing with bad sectors. Keep in mind that this "number of reallocated sectors" may be no such thing. This problem becomes particularly apparent when gnome-disk-utility begins showing alarming pop-up warnings at every login. The issue has been reported several times in different distributions. Here are some examples: https://bugzilla.redhat.com/show_bug.cgi?id=498115 https://bugzilla.redhat.com/show_bug.cgi?id=500079 https://bugzilla.redhat.com/show_bug.cgi?id=506254 https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/438136 https://bugs.launchpad.net/ubuntu/+source/gnome-disk-utility/+bug/477280 http://bugs.freedesktop.org/show_bug.cgi?id=25772 Additionally, the upstream author seems reluctant to address this problem, so I suggest a patch in Debian to address it in the meantime. The author's response can be seen in the freedesktop.org bug above. I'm attaching a proposed patch. It removes warnings in skdump for when the raw value of reallocated-sector-count or current-pending-sector is greater than 0, prevents the BAD_SECTOR_MANY and BAD_SECTOR overall status flags from being used, and removes some stuff that becomes unused with the other changes. Note that suppressing BAD_SECTOR and BAD_SECTOR_MANY only prevents the arbitrary warnings. Actual SMART failures are still reported. -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (990, 'testing'), (300, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.32-3-686 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages libatasmart4 depends on: ii libc6 2.10.2-9 Embedded GNU C Library: Shared lib ii libudev0 154-1 libudev shared library libatasmart4 recommends no packages. libatasmart4 suggests no packages. -- no debconf information
diff -u -r libatasmart-0.17+git20100219.orig//atasmart.c libatasmart-0.17+git20100219/atasmart.c --- libatasmart-0.17+git20100219.orig//atasmart.c 2010-06-06 12:07:45.000000000 -0700 +++ libatasmart-0.17+git20100219/atasmart.c 2010-06-07 15:18:29.000000000 -0700 @@ -1265,11 +1265,6 @@ if (max_sectors > 0 && a->pretty_value > max_sectors) { a->pretty_value = SK_SMART_ATTRIBUTE_UNIT_UNKNOWN; d->attribute_verification_bad = TRUE; - } else { - if ((!strcmp(a->name, "reallocated-sector-count") || - !strcmp(a->name, "current-pending-sector")) && - a->pretty_value > 0) - a->warn = TRUE; } } @@ -2106,6 +2101,7 @@ *good = FALSE; } +#if 0 static uint64_t u64log2(uint64_t n) { unsigned r; @@ -2120,10 +2116,13 @@ r++; } } +#endif int sk_disk_smart_get_overall(SkDisk *d, SkSmartOverall *overall) { SkBool good; +#if 0 uint64_t sectors, sector_threshold; +#endif assert(d); assert(overall); @@ -2137,6 +2136,7 @@ return 0; } +#if 0 /* Second, check if the number of bad sectors is greater than * a certain threshold */ if (sk_disk_smart_get_bad(d, §ors) < 0) { @@ -2154,6 +2154,7 @@ return 0; } } +#endif /* Third, check if any of the SMART attributes is bad */ good = TRUE; @@ -2165,11 +2166,13 @@ return 0; } +#if 0 /* Fourth, check if there are any bad sectors at all */ if (sectors > 0) { *overall = SK_SMART_OVERALL_BAD_SECTOR; return 0; } +#endif /* Fifth, check if any of the SMART attributes ever was bad */ good = TRUE; @@ -2380,10 +2383,8 @@ if (sk_disk_smart_get_bad(d, &value) < 0) printf("Bad Sectors: %s\n", strerror(errno)); else - printf("%sBad Sectors: %s%s\n", - value > 0 ? HIGHLIGHT : "", - print_value(pretty, sizeof(pretty), value, SK_SMART_ATTRIBUTE_UNIT_SECTORS), - value > 0 ? ENDHIGHLIGHT : ""); + printf("Bad Sectors: %s\n", + print_value(pretty, sizeof(pretty), value, SK_SMART_ATTRIBUTE_UNIT_SECTORS)); if (sk_disk_smart_get_power_on(d, &power_on) < 0) { printf("Powered On: %s\n", strerror(errno));