Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.

Nils Radtke Fri, 18 Mar 2005 10:30:35 -0800

        Hi Bartlomiej,

Thanks for your link.


# >  hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
# >  hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, 
low=262311, sector=262311
# >  ide: failed opcode was: unknown
# >  end_request: I/O error, dev hdg, sector 262311
# >  Buffer I/O error on device hdg1, logical block 131124
# > 
# >   fscking this disk freezes the entire system.
# > 
# >  The disk was remounted ro afterwards.
# >  Disk itself is ok. Is a new one.

# http://smartmontools.sf.net
Extract from /usr/share/doc/smartmontools/WARNINGS.gz:

SYSTEM:   Promise 20265 IDE-controller
PROBLEM:  Smartctl locks system solid when used on CDROM/DVD device
REPORTER: see link below
LINK:     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208964
NOTE:     Problem seems to affect kernel 2.4.21 only.


SYSTEM:   Promise IDE-controllers and perhaps others also
PROBLEM:  System freezes under heavy load, perhaps when running SMART
commands
REPORTER: Mario 'BitKoenig' Holbe [EMAIL PROTECTED]
LINK:
http://groups.google.de/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=1wUXW-
2FA-9%40gated-at.bofh.it
NOTE:     Before freezing, SYSLOG shows the following message(s)
          kernel: hdf: dma timer expiry: dma status == 0xXX
          where XX is two hexidecimal digits. This may be a kernel bug
          or an underlying hardware problem.  It's not clear if
          smartmontools plays a role in provoking this problem.  FINAL
          NOTE: Problem was COMPLETELY resolved by replacing the power
          supply.  See URL above, entry on May 29, 2004 by Holbe.  Other
          things to try are exchanging cables, and cleaning PCI slots.


This sounds highly familiar and shows an at least hidden
correlation(-potential) between this kind of error and the Promise controller 
PDC drivers.
Ok, maybe I'm suffering prejudices now. We'll see.
A year ago, other disks (IBM/WD) had trouble on the PDC also, but not on onboard
controllers. And they are still spinning today. (Means, they had not to
be replaced for hard disk errors)

Fact is however, that as mailed last year, even after a complete
exchange of mainboard and processor, the problem perexists through any
kernel-version. Furthermore, countless posts indicate similar or same
symptoms.

Nevertheless, I keep the list up-to-date in case of new info.

smartctl -a /dev/hdc gives:
Error 18 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 f8 a8 05 c3 e0  Error: UNC at LBA = 0x00c305a8 = 12780968

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  24 00 f8 a7 05 c3 06 00      00:08:14.850  READ SECTOR(S) EXT
  25 00 00 9f 05 c3 06 00      00:08:14.850  READ DMA EXT
  25 00 00 9f 04 c3 06 00      00:08:14.850  READ DMA EXT
  25 00 00 9f 03 c3 06 00      00:08:14.850  READ DMA EXT
  25 00 00 9f 02 c3 06 00      00:08:14.850  READ DMA EXT

Error 17 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 47 06 c3 e0  Error: UNC at LBA = 0x00c30647 = 12781127

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 9f 05 c3 06 00      00:07:48.550  READ DMA EXT
  25 00 00 9f 04 c3 06 00      00:07:48.550  READ DMA EXT
  25 00 00 9f 03 c3 06 00      00:07:48.550  READ DMA EXT
  25 00 00 9f 02 c3 06 00      00:07:48.550  READ DMA EXT
  25 00 00 9f 01 c3 06 00      00:07:48.550  READ DMA EXT

Error 16 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was doing
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 20 b0 f2 57 e0  Error: UNC at LBA = 0x0057f2b0 = 5763760

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  24 00 20 af f2 57 10 00      00:43:45.600  READ SECTOR(S) EXT
  25 00 28 a7 f2 57 10 00      00:43:45.600  READ DMA EXT
  25 00 18 77 f2 57 10 00      00:43:45.600  READ DMA EXT
  25 00 18 5f 28 57 11 00      00:43:45.600  READ DMA EXT
  25 00 08 7f 10 54 10 00      00:43:45.600  READ DMA EXT

Error 15 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was doing
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 28 b9 f2 57 e0  Error: UNC 40 sectors at LBA = 0x0057f2b9 =
5763769

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 28 a7 f2 57 10 00      00:43:17.850  READ DMA EXT
  25 00 18 77 f2 57 10 00      00:43:17.850  READ DMA EXT
  25 00 18 5f 28 57 11 00      00:43:17.850  READ DMA EXT
  25 00 08 7f 10 54 10 00      00:43:17.850  READ DMA EXT
  25 00 08 77 28 57 11 00      00:43:17.850  READ DMA EXT

Error 14 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was doing
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 f8 23 3e 56 e0  Error: UNC at LBA = 0x00563e23 = 5652003

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  24 00 f8 07 3e 56 10 00      00:36:28.850  READ SECTOR(S) EXT
  25 00 00 ff 3d 56 10 00      00:36:28.850  READ DMA EXT
  25 00 00 ff 3c 56 10 00      00:36:28.850  READ DMA EXT
  25 00 00 ff 3b 56 10 00      00:36:28.850  READ DMA EXT
  25 00 00 ff 3a 56 10 00      00:36:28.850  READ DMA EXT


Could you please explain what these errors mean exactly and what may
have caused them?

Might it be possible that these transmission/xxx errors be caused 
by a bad card and/or driver?

I'm asking this as the disk never showed errors on onboard IDE ports.

        Nils

-- 
A+
* N.Radtke@                 * University of Stuttgart *    icq / lc   *
*      www.Think-Future.de  *    dep.comp.science     * 9336272/92045 *
:x                            UTM 32 0515651 5394088                 :)
   One FISHWICH coming up!! 
   
   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.

Reply via email to