From: David Purton <dcpur...@marshwiggle.net> To: debian-user@lists.debian.org Cc: Bcc: Subject: Re: Disk performance deteriated to unbearable levels Reply-To: In-Reply-To: <20111108181428.gb13...@hysteria.proulx.com> X-GPG-Fingerprint: 2D6A A66E F9DC E86A 876F 062D 16D7 EA32 EE08 09EC X-GPG-Public-Key: http://marshwiggle.net/~dcpurton/pubkey.asc X-URL: http://marshwiggle.net/~dcpurton/
Hi Bob, Thanks for your detailed answer! On Tue, Nov 08, 2011 at 11:14:28AM -0700, Bob Proulx wrote: > David Purton wrote: > > Everything takes forever to load (including booting), but then runs ok > > once loaded. > > Could DMA be disabled now? Taking a long time to read initially but > running okay afterward would match that symptom. Because after the > initial read it should be in filesystem buffer cache. hdparm neither lets me get nor set the dma mode (HDIO_SET_DMA failed: Inappropriate ioctl for device). But I think DMA is enabled on the disk. From dmesg: [ 1.808090] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1.809113] ata1.00: unexpected _GTF length (8) [ 1.809432] ata1.00: ATA-8: Hitachi HTS545025B9A300, PB2OC60N, max UDMA/133 [ 1.809439] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [ 1.810564] ata1.00: unexpected _GTF length (8) [ 1.810883] ata1.00: configured for UDMA/133 [ 1.811163] scsi 0:0:0:0: Direct-Access ATA Hitachi HTS54502 PB2O PQ: 0 ANSI: 5 [ 1.823891] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB) [ 1.824139] sd 0:0:0:0: [sda] Write Protect is off [ 1.824149] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 1.824251] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > My only guess is that it is filesystem related, but I am not sure how to > > confirm this, nor why things would have got to the present situation. > > > > Currently, the root/system parition is 20GB, with 50% used. /home has > > only 44% used. > > That seems like a good amount of free space available for the > filesystem to deal with disk fragmentation. > > > Any suggestions? Ha! I just found some disk related errors in syslog: Nov 2 12:10:58 swires kernel: [33736.415350] sd 0:0:0:0: [sda] Unhandled error code Nov 2 12:10:58 swires kernel: [33736.415367] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT Nov 2 12:10:58 swires kernel: [33736.415376] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 16 48 77 76 00 01 d0 00 Nov 2 12:10:58 swires kernel: [33736.415395] end_request: I/O error, dev sda, sector 373847926 Nov 2 12:10:58 swires kernel: [33736.415404] Buffer I/O error on device sda7, logical block 15133136 Nov 2 12:10:58 swires kernel: [33736.415409] lost page write due to I/O error on sda7 Nov 2 12:10:58 swires kernel: [33736.415415] Buffer I/O error on device sda7, logical block 15133137 Nov 2 12:10:58 swires kernel: [33736.415420] lost page write due to I/O error on sda7 Nov 2 12:10:58 swires kernel: [33736.415427] Buffer I/O error on device sda7, logical block 15133138 I'm guessing this is bad! :( However, I can find limited details on Google. Both partitions are seemingly affected, so I guess disk problems are more likely than file system :(. *sigh* > Since other suggested possible hard drive problems... What does > smartctl say about the health of your drive? > > smartctl -H /dev/sda === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED > Any selftest failures? Are you running smartctl selftests? If not > then please do. I always run selftests regularly to get feedback > about the drives. Let me suggest something similar to this in > /etc/smartd.conf so as to have these run automatically. > > # Monitor all attributes, enable automatic online data collection, > # automatic Attribute autosave, and start a short self-test every day > # between 2-3am, and a long self test Saturdays between 3-4am. > # On failure run all installed scripts (to send notification email). > # Ignore attribute 194 temperature change. > # Ignore attribute 190 airflow temperature change. > /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 -I 190 -m > root -M exec /usr/share/smartmontools/smartd-runner > > This will dump the selftests. Any failures? > > smartctl -l selftest /dev/sda === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 3686 - > > If you need to manually run selftests: > > smartctl -t short /dev/sda > > If short passes pick a time and run: > > smartctl -t long /dev/sda Haven't done this yet. > You might try using 'hdparm' to produce some data for your disk. Read > the hdparm documentation first (lots of docs on the web such as this) > > http://www.gentoo-wiki.info/Hdparm#Benchmarking_devices > > and then you might try this on an otherwise idle system. As far as I can tell, it's defaults are reasonably optimal. # hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: Hitachi HTS545025B9A300 Serial Number: 091204PB42061SDBAXTL Firmware Revision: PB2OC60N Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b Standards: Used: unknown (minor revision code 0x0028) Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 488397168 Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 238475 MBytes device size with M = 1000*1000: 250059 MBytes (250 GB) cache/buffer size = 7208 KBytes (type=DualPortCache) Form Factor: 2.5 inch Nominal Media Rotation Rate: 5400 Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Vendor, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Advanced power management level: 128 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE * Advanced Power Management feature set Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name * IDLE_IMMEDIATE with UNLOAD * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * NCQ priority information Non-Zero buffer offsets in DMA Setup FIS * DMA Setup Auto-Activate optimization Device-initiated interface power management In-order data delivery * Software settings preservation * SMART Command Transport (SCT) feature set * SCT LBA Segment Access (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 82min for SECURITY ERASE UNIT. 84min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 5000cca5e8d3409c NAA : 5 IEEE OUI : 000cca Unique ID : 5e8d3409c Checksum: correct > # hdparm -tT /dev/sda > Timing cached reads: 4634 MB in 2.00 seconds = 2320.31 MB/sec > Timing buffered disk reads: 378 MB in 3.01 seconds = 125.71 MB/sec /dev/sda: Timing cached reads: 1462 MB in 2.00 seconds = 731.55 MB/sec Timing buffered disk reads: 250 MB in 3.02 seconds = 82.71 MB/sec > Lastly you could benchmark the filesystem (a layer on top of the disk > system) using bonnie/bonnie++. > > > Both are ext3 > > Directly on the disk partition (e.g. /dev/sda5)? Or on top of LVM? > Or on top of RAID (e.g. /dev/md1)? Or LVM on RAID? Directly on the disk partition. > > I do not want to reinstall if at all possible. > > I am always an advocate of upgrades not re-installs. :-) I have a bad feeling about this one :( David -- David Purton dcpur...@marshwiggle.net For the eyes of the LORD range throughout the earth to strengthen those whose hearts are fully committed to him. 2 Chronicles 16:9a
signature.asc
Description: Digital signature