Re: [CentOS] Server offline :-( please help to repair software RAID

2011-05-01 Thread Alexander Farber
Hello Mark and others,

On Thu, Apr 28, 2011 at 10:21 PM,  m.r...@5-cent.us wrote:
 At this point, I'd run the long test on each drive, and (after coming back
 an hour or two later, see the results.

I have that dreadly warning again -

/etc/cron.weekly/99-raid-check:
   WARNING: mismatch_cnt is not 0 on /dev/md0

By the long tests do you mean some Linux command
I could run while booted in rescue mode?

Or do you mean inserting Seagate/WD/whatever CD?

(Because Strato.de people refuse to do the latter -
I only pay EUR 29 + 59/month, locked until Dec.,
why would they do anything for me /sarcasm)

Regards
Alex

PS: below my disk info:


# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
  1023936 blocks [2/2] [UU]

md2 : active raid1 sdb5[1] sda5[0]
  277728192 blocks [2/2] [UU]

md3 : active raid1 sdb6[1] sda6[0]
  185151360 blocks [2/2] [UU]

md1 : active raid1 sdb3[1] sda3[0]
  20479936 blocks [2/2] [UU]

unused devices: none


# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/md1   20G  1.7G   17G   9% /
/dev/md3  176G  6.2G  161G   4% /var
/dev/md0  993M   42M  901M   5% /boot
/dev/md2  263G  2.0G  248G   1% /home
tmpfs 2.0G 0  2.0G   0% /dev/shm
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-05-01 Thread Markus Falb
On 1.5.2011 08:52, Alexander Farber wrote:
 Hello Mark and others,
 
 On Thu, Apr 28, 2011 at 10:21 PM,  
 m.roth-x6lchvbuigd1p9xltph...@public.gmane.org wrote:
 At this point, I'd run the long test on each drive, and (after coming back
 an hour or two later, see the results.
 
 I have that dreadly warning again -
 
 /etc/cron.weekly/99-raid-check:
WARNING: mismatch_cnt is not 0 on /dev/md0

This does not mean necessarily mean that something is wrong.
Writes are not occuring at exactly the same time, there is a short
timespan where data is written to disk A but not yet to disk B. So it is
possible that 2 mirrored blocks hold different data.

http://marc.info/?l=linux-raidm=117555829807542w=2
http://marc.info/?l=linux-raidm=117304688617916w=2
https://bugzilla.redhat.com/show_bug.cgi?id=566828

 By the long tests do you mean some Linux command
 I could run while booted in rescue mode?

$ smartctl -t long /dev/sdX

No need for rescue mode.

-- 
Kind Regards, Markus Falb



signature.asc
Description: OpenPGP digital signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Alexander Farber
Hello,

since weeks I was ignoring this warning at my CentOS 5.6/64 bit machine -

/etc/cron.weekly/99-raid-check:
WARNING: mismatch_cnt is not 0 on /dev/md0

in the hope that the software RAID will slowly repair itself.

I also had executed echo 10  /proc/sys/dev/raid/speed_limit_max
on the advice from the mailing list.

But now my web server is offline - I had to boot it remotely with rescue system.

Does anybody please have an advice what commands to run
and do you think it is a RAID problem at all?

# dmesg
Linux version 2.6.34 (root@imagemaster30) (gcc version 4.3.2 (Debian
4.3.2-1.1) ) #20 SMP Mon Jul 19 18:35:15 CEST 2010
Command line: ramdisk_size=81920 initrd=rescue-image-2.6-64
root=/dev/ram BOOT_IMAGE=rescue-kernel-2.6-64
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f000 (usable)
 BIOS-e820: 0009f000 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - ddfb (usable)
 BIOS-e820: ddfb - ddfbe000 (ACPI data)
 BIOS-e820: ddfbe000 - ddfe (ACPI NVS)
 BIOS-e820: ddfe - ddfee000 (reserved)
 BIOS-e820: ddff - de00 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 00012000 (usable)
NX (Execute Disable) protection: active
DMI present.
AMI BIOS detected: BIOS may corrupt low RAM, working around it.
e820 update range:  - 0001 (usable) == (reserved)
e820 update range:  - 1000 (usable) == (reserved)
e820 remove range: 000a - 0010 (usable)
No AGP bridge found
last_pfn = 0x12 max_arch_pfn = 0x4
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-E uncachable
  F-F write-protect
MTRR variable ranges enabled:
  0 base  mask 8000 write-back
  1 base 8000 mask C000 write-back
  2 base C000 mask E000 write-back
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
TOM2: 00012000 aka 4608M
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
e820 update range: e000 - 0001 (usable) == (reserved)
last_pfn = 0xddfb0 max_arch_pfn = 0x4
initial memory mapped : 0 - 2000
found SMP MP-table at [880ff780] ff780
Using GB pages for direct mapping
init_memory_mapping: -ddfb
 00 - 00c000 page 1G
 00c000 - 00dde0 page 2M
 00dde0 - 00ddfb page 4k
kernel direct mapping tables up to ddfb @ 12000-15000
init_memory_mapping: 0001-00012000
 01 - 012000 page 2M
kernel direct mapping tables up to 12000 @ 14000-16000
RAMDISK: 7d792000 - 8000
ACPI: RSDP 000faf80 00014 (v00 ACPIAM)
ACPI: RSDT ddfb 0003C (v01 032510 RSDT1503 20100325 MSFT 0097)
ACPI: FACP ddfb0200 00084 (v02 032510 FACP1503 20100325 MSFT 0097)
ACPI: DSDT ddfb0440 0447E (v01  A96B3 A96B3210 0210 INTL 20051117)
ACPI: FACS ddfbe000 00040
ACPI: APIC ddfb0390 0006C (v01 032510 APIC1503 20100325 MSFT 0097)
ACPI: MCFG ddfb0400 0003C (v01 032510 OEMMCFG  20100325 MSFT 0097)
ACPI: OEMB ddfbe040 00071 (v01 032510 OEMB1503 20100325 MSFT 0097)
ACPI: HPET ddfb48c0 00038 (v01 032510 OEMHPET  20100325 MSFT 0097)
ACPI: SSDT ddfb4900 0088C (v01 A M I  POWERNOW 0001 AMD  0001)
ACPI: Local APIC address 0xfee0
Scanning NUMA topology in Northbridge 24
No NUMA configuration found
Faking a node at -00012000
Initmem setup node 0 -00012000
  NODE_DATA [0001 - 00014fff]
 [ea00-ea0003ff] PMD -
[88010020-880103bf] on node 0
Zone PFN ranges:
  DMA  0x0010 - 0x1000
  DMA320x1000 - 0x0010
  Normal   0x0010 - 0x0012
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
0: 0x0010 - 0x009f
0: 0x0100 - 0x000ddfb0
0: 0x0010 - 0x0012
On node 0 totalpages: 1040191
  DMA zone: 56 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 3927 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 890856 pages, LIFO batch:31
  Normal zone: 1792 pages used for memmap
  Normal zone: 129280 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 4, version 33, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: 

Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Alexander Farber
Additional info (how many RAID arrays do I have??):

# mdadm -D /dev/md3
/dev/md3:
Version : 00.90
  Creation Time : Sat Mar 19 22:53:25 2011
 Raid Level : raid1
 Array Size : 185151360 (176.57 GiB 189.59 GB)
  Used Dev Size : 185151360 (176.57 GiB 189.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Thu Apr 28 21:09:12 2011
  State : clean, resyncing
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 38% complete

   UUID : 1b3668a3:4b6c5593:3d186b3c:53958f34
 Events : 0.15

Number   Major   Minor   RaidDevice State
   0   860  active sync   /dev/sda6
   1   8   221  active sync   /dev/sdb6
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Digimer
On 04/28/2011 03:10 PM, Alexander Farber wrote:
  Rebuild Status : 38% complete

That's potentially promising. What does 'cat /proc/mdstat' show? Did you
have to recover the array, or were you able to use /etc/mdadm.conf?

-- 
Digimer
E-Mail: digi...@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Alexander Farber
Hello, I didn't touch anything, just booted the hoster's rescue image.

# cat /etc/mdadm.conf
cat: /etc/mdadm.conf: No such file or directory


# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda1[0] sdb1[1]
  1023936 blocks [2/2] [UU]

md1 : active raid1 sda3[0] sdb3[1]
  20479936 blocks [2/2] [UU]
resync=DELAYED

md2 : active raid1 sda5[0] sdb5[1]
  277728192 blocks [2/2] [UU]

md3 : active raid1 sda6[0] sdb6[1]
  185151360 blocks [2/2] [UU]
  [=...]  resync = 85.3% (158109056/185151360)
finish=5.3min speed=83532K/sec

unused devices: none

# mdadm -D /dev/md3
/dev/md3:
Version : 00.90
  Creation Time : Sat Mar 19 22:53:25 2011
 Raid Level : raid1
 Array Size : 185151360 (176.57 GiB 189.59 GB)
  Used Dev Size : 185151360 (176.57 GiB 189.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Thu Apr 28 21:23:48 2011
  State : active, resyncing
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 85% complete

   UUID : 1b3668a3:4b6c5593:3d186b3c:53958f34
 Events : 0.31

Number   Major   Minor   RaidDevice State
   0   860  active sync   /dev/sda6
   1   8   221  active sync   /dev/sdb6
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Les Mikesell
On 4/28/2011 2:07 PM, Alexander Farber wrote:
 Hello,

 since weeks I was ignoring this warning at my CentOS 5.6/64 bit machine -

  /etc/cron.weekly/99-raid-check:
  WARNING: mismatch_cnt is not 0 on /dev/md0

 in the hope that the software RAID will slowly repair itself.

 I also had executed echo 10  /proc/sys/dev/raid/speed_limit_max
 on the advice from the mailing list.

 But now my web server is offline - I had to boot it remotely with rescue 
 system.

 Does anybody please have an advice what commands to run
 and do you think it is a RAID problem at all?


A 'cat /proc/mdstat' should show the state of the raid mirroring.  I 
don't see anything that would explain not booting, though.  Raid1 works 
normally even when only one member is available and should continue to 
work while rebuilding.  Maybe the problem that caused the mismatch has 
corrupted the drive the system normally boots.

-- 
   Les Mikesell
lesmikes...@gmail.com


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Michel van Deventer
Hi,

what is the output of 'cat /proc/mdstat' ?

A healthy raid should look something like below :
[root@janeway ~]# cat /proc/mdstat 
Personalities : [raid1] 
md2 : active raid1 sdb1[0] sda1[1]
  256896 blocks [2/2] [UU]
  
md0 : active raid1 sdd1[0] sdc1[1]
  1465135936 blocks [2/2] [UU]
  
md3 : active raid1 sdb3[1] sda3[0]
  730218432 blocks [2/2] [UU]

I have 3 RAID1 arrays (over 4 disks)

On Thu, 2011-04-28 at 21:10 +0200, Alexander Farber wrote:
 Additional info (how many RAID arrays do I have??):
 
 # mdadm -D /dev/md3
 /dev/md3:
 Version : 00.90
   Creation Time : Sat Mar 19 22:53:25 2011
  Raid Level : raid1
  Array Size : 185151360 (176.57 GiB 189.59 GB)
   Used Dev Size : 185151360 (176.57 GiB 189.59 GB)
Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 3
 Persistence : Superblock is persistent
 
 Update Time : Thu Apr 28 21:09:12 2011
   State : clean, resyncing
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0
 
  Rebuild Status : 38% complete
 
UUID : 1b3668a3:4b6c5593:3d186b3c:53958f34
  Events : 0.15
 
 Number   Major   Minor   RaidDevice State
0   860  active sync   /dev/sda6
1   8   221  active sync   /dev/sdb6
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Digimer
On 04/28/2011 03:26 PM, Alexander Farber wrote:
 Hello, I didn't touch anything, just booted the hoster's rescue image.
 
 # cat /etc/mdadm.conf
 cat: /etc/mdadm.conf: No such file or directory
 
 
 # cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1]
 md0 : active raid1 sda1[0] sdb1[1]
   1023936 blocks [2/2] [UU]
 
 md1 : active raid1 sda3[0] sdb3[1]
   20479936 blocks [2/2] [UU]
 resync=DELAYED
 
 md2 : active raid1 sda5[0] sdb5[1]
   277728192 blocks [2/2] [UU]
 
 md3 : active raid1 sda6[0] sdb6[1]
   185151360 blocks [2/2] [UU]
   [=...]  resync = 85.3% (158109056/185151360)
 finish=5.3min speed=83532K/sec
 
 unused devices: none

I'd wait for it to finish and then try rebooting normally. Post back
after md1 and md3 are completed sync'ing.

-- 
Digimer
E-Mail: digi...@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Michel van Deventer
Hi,

On Thu, 2011-04-28 at 21:26 +0200, Alexander Farber wrote:
 Hello, I didn't touch anything, just booted the hoster's rescue image.
Cool :)

 # cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1]
 md0 : active raid1 sda1[0] sdb1[1]
   1023936 blocks [2/2] [UU]
 
 md1 : active raid1 sda3[0] sdb3[1]
   20479936 blocks [2/2] [UU]
 resync=DELAYED
 
 md2 : active raid1 sda5[0] sdb5[1]
   277728192 blocks [2/2] [UU]
 
 md3 : active raid1 sda6[0] sdb6[1]
   185151360 blocks [2/2] [UU]
   [=...]  resync = 85.3% (158109056/185151360)
 finish=5.3min speed=83532K/sec
 
Let md3 rebuild, wait for md1 to rebuild (check regularly with
cat /proc/mdstat) and reboot your machine without the rescue, it should
come up again.

Regards,

Michel



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Digimer
On 04/28/2011 03:31 PM, Michel van Deventer wrote:
 Hi,
 
 On Thu, 2011-04-28 at 21:26 +0200, Alexander Farber wrote:
 Hello, I didn't touch anything, just booted the hoster's rescue image.
 Cool :)
 
 # cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1]
 md0 : active raid1 sda1[0] sdb1[1]
   1023936 blocks [2/2] [UU]

 md1 : active raid1 sda3[0] sdb3[1]
   20479936 blocks [2/2] [UU]
 resync=DELAYED

 md2 : active raid1 sda5[0] sdb5[1]
   277728192 blocks [2/2] [UU]

 md3 : active raid1 sda6[0] sdb6[1]
   185151360 blocks [2/2] [UU]
   [=...]  resync = 85.3% (158109056/185151360)
 finish=5.3min speed=83532K/sec

 Let md3 rebuild, wait for md1 to rebuild (check regularly with
 cat /proc/mdstat) and reboot your machine without the rescue, it should
 come up again.
 
   Regards,
 
   Michel

Run 'watch cat /proc/mdstat'. :)

-- 
Digimer
E-Mail: digi...@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Alexander Farber
Thank you all, it seems to have finished - I'm rebooting.

Just curious why is the State of md3 active, while the others are clean?


# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda1[0] sdb1[1]
  1023936 blocks [2/2] [UU]

md1 : active raid1 sda3[0] sdb3[1]
  20479936 blocks [2/2] [UU]
  [=...]  resync = 86.6% (17746816/20479936)
finish=0.3min speed=131514K/sec

md2 : active raid1 sda5[0] sdb5[1]
  277728192 blocks [2/2] [UU]

md3 : active raid1 sda6[0] sdb6[1]
  185151360 blocks [2/2] [UU]

unused devices: none

..Then after some wait:...


# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda1[0] sdb1[1]
  1023936 blocks [2/2] [UU]

md1 : active raid1 sda3[0] sdb3[1]
  20479936 blocks [2/2] [UU]

md2 : active raid1 sda5[0] sdb5[1]
  277728192 blocks [2/2] [UU]

md3 : active raid1 sda6[0] sdb6[1]
  185151360 blocks [2/2] [UU]

unused devices: none


# mdadm -D /dev/md3
/dev/md3:
Version : 00.90
  Creation Time : Sat Mar 19 22:53:25 2011
 Raid Level : raid1
 Array Size : 185151360 (176.57 GiB 189.59 GB)
  Used Dev Size : 185151360 (176.57 GiB 189.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Thu Apr 28 21:31:12 2011
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

   UUID : 1b3668a3:4b6c5593:3d186b3c:53958f34
 Events : 0.39

Number   Major   Minor   RaidDevice State
   0   860  active sync   /dev/sda6
   1   8   221  active sync   /dev/sdb6


# mdadm -D /dev/md1
/dev/md1:
Version : 00.90
  Creation Time : Sat Mar 19 22:52:20 2011
 Raid Level : raid1
 Array Size : 20479936 (19.53 GiB 20.97 GB)
  Used Dev Size : 20479936 (19.53 GiB 20.97 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Thu Apr 28 21:33:56 2011
  State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

   UUID : 8812725b:ea156bc6:3d186b3c:53958f34
 Events : 0.48

Number   Major   Minor   RaidDevice State
   0   830  active sync   /dev/sda3
   1   8   191  active sync   /dev/sdb3


# mdadm -D /dev/md0
/dev/md0:
Version : 00.90
  Creation Time : Sat Mar 19 22:52:12 2011
 Raid Level : raid1
 Array Size : 1023936 (1000.11 MiB 1048.51 MB)
  Used Dev Size : 1023936 (1000.11 MiB 1048.51 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Apr 28 21:06:24 2011
  State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

   UUID : 87db17c2:d806a38c:3d186b3c:53958f34
 Events : 0.14

Number   Major   Minor   RaidDevice State
   0   810  active sync   /dev/sda1
   1   8   171  active sync   /dev/sdb1


# mdadm -D /dev/md2
/dev/md2:
Version : 00.90
  Creation Time : Sat Mar 19 22:52:32 2011
 Raid Level : raid1
 Array Size : 277728192 (264.86 GiB 284.39 GB)
  Used Dev Size : 277728192 (264.86 GiB 284.39 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Thu Apr 28 21:17:54 2011
  State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

   UUID : 2db0174b:e45768d5:3d186b3c:53958f34
 Events : 0.14

Number   Major   Minor   RaidDevice State
   0   850  active sync   /dev/sda5
   1   8   211  active sync   /dev/sdb5
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Alexander Farber
On the 2nd try it has booted and seems to work.

The /var/log/mcelog is (and was) empty.

# sudo cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
  1023936 blocks [2/2] [UU]

md2 : active raid1 sdb5[1] sda5[0]
  277728192 blocks [2/2] [UU]

md3 : active raid1 sdb6[1] sda6[0]
  185151360 blocks [2/2] [UU]

md1 : active raid1 sdb3[1] sda3[0]
  20479936 blocks [2/2] [UU]

unused devices: none


Below is the output from the remote console of my hoster.

If you notice anything or have any advice, please share.




GNU GRUB  version 0.97  (636K lower / 3635904K upper memory)

 +-+
 | CentOS (2.6.18-238.9.1.el5) |
 | CentOS (2.6.18-238.5.1.el5) |
 | CentOS (2.6.18-194.32.1.el5)|
 | CentOS 5|
 | CentOS 5 Disk 2 |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
 +-+
  Use the ^ and v keys to select which entry is highlighted.
  Press enter to boot the selected OS, 'e' to edit the
  commands before booting, 'a' to modify the kernel arguments
  before booting, or 'c' for a command-line.

   The highlighted entry will be booted automatically in 1 seconds.
  Booting 'CentOS (2.6.18-238.9.1.el5)'

root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
kernel /vmlinuz-2.6.18-238.9.1.el5 root=/dev/md1 console=tty0 console=ttyS0,576
00
   [Linux-bzImage, setup=0x1e00, size=0x1fd9fc]
initrd /initrd-2.6.18-238.9.1.el5.img
   [Linux-initrd @ 0x37d5f000, 0x290aac bytes]

Linux version 2.6.18-238.9.1.el5 (mockbu...@builder10.centos.org) (gcc
version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Tue Apr 12 18:10:13
EDT 2011
Command line: root=/dev/md1 console=tty0 console=ttyS0,57600
BIOS-provided physical RAM map:
 BIOS-e820: 0001 - 0009f000 (usable)
 BIOS-e820: 0009f000 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - ddfb (usable)
 BIOS-e820: ddfb - ddfbe000 (ACPI data)
 BIOS-e820: ddfbe000 - ddfe (ACPI NVS)
 BIOS-e820: ddfe - ddfee000 (reserved)
 BIOS-e820: ddff - de00 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 00012000 (usable)
DMI present.
No NUMA configuration found
Faking a node at -00012000
Bootmem setup node 0 -00012000
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
ACPI: PM-Timer IO Port: 0x808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 0:4 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 0:4 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
Processor #2 0:4 APIC version 16
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
Processor #3 0:4 APIC version 16
ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 4, version 33, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Setting APIC routing to flat
ACPI: HPET id: 0x8300 base: 0xfed0
Using ACPI (MADT) for SMP configuration information
Nosave address range: 0009f000 - 000a
Nosave address range: 000a - 000e4000
Nosave address range: 000e4000 - 0010
Nosave address range: ddfb - ddfbe000
Nosave address range: ddfbe000 - ddfe
Nosave address range: ddfe - ddfee000
Nosave address range: ddfee000 - ddff
Nosave address range: ddff - de00
Nosave address range: de00 - ff70
Nosave address range: ff70 - 0001
Allocating PCI resources starting at e000 (gap: de00:2170)
SMP: Allowing 4 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 1022573
Kernel command line: root=/dev/md1 console=tty0 console=ttyS0,57600
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 

Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread m . roth
Alexander Farber wrote:
 On the 2nd try it has booted and seems to work.

 The /var/log/mcelog is (and was) empty.

To be expected - I'd expect this as a h/d error. Check your logfiles for
info from smartd

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Scott Silva
on 4/28/2011 12:40 PM Alexander Farber spake the following:
 Thank you all, it seems to have finished - I'm rebooting.
 
 Just curious why is the State of md3 active, while the others are clean?
 
If I remember right, clean means it is completely synced and not being written
to or mounted. Active means it is or has been written to and is synced.
Usually dirty means that there is un-synced data on one or the other drives.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Michel van Deventer
On Thu, 2011-04-28 at 21:52 +0200, Alexander Farber wrote:
 On the 2nd try it has booted and seems to work.
Did it give an error on the first try and if so, which one ?

You should check /var/log/messages for i/o errors and check your disks
with smartctl

I have had my raid1 arrays rebuild sometime without a (for me known)
reason. Even had a defective networkcard kernel panic the machine for
two hours and the raids were still working afterwards ;) 

Regards,

Michel


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread Alexander Farber
Turned out, smartd kept saying, that it had no entries in smartd.conf.

I've copied smartd.rpmnew over smartd.conf, restarted it,
now I have (in /var/log/messages, date+hostname removed):

smartd version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Opened configuration file /etc/smartd.conf
Configuration file /etc/smartd.conf was parsed, found DEVICESCAN,
scanning devices
Problem creating device name scan list
Device: /dev/sda, opened
Device /dev/sda: using '-d sat' for ATA disk behind SAT layer.
Device: /dev/sda, opened
Device: /dev/sda, not found in smartd database.
Device: /dev/sda, is SMART capable. Adding to monitor list.
Device: /dev/sdb, opened
Device /dev/sdb: using '-d sat' for ATA disk behind SAT layer.
Device: /dev/sdb, opened
Device: /dev/sdb, not found in smartd database.
Device: /dev/sdb, is SMART capable. Adding to monitor list.
Monitoring 0 ATA and 2 SCSI devices
smartd has fork()ed into background mode. New PID=3427.

And the /etc/smartd.conf contains:
DEVICESCAN -H -m root
and the rest are comments.

Do you think it is configured okay this way?

My disk info is:

# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/md1   20G  1.7G   17G   9% /
/dev/md3  176G  7.0G  160G   5% /var
/dev/md0  993M   42M  901M   5% /boot
/dev/md2  263G  2.0G  248G   1% /home
tmpfs 2.0G 0  2.0G   0% /dev/shm

#  cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
  1023936 blocks [2/2] [UU]

md2 : active raid1 sdb5[1] sda5[0]
  277728192 blocks [2/2] [UU]

md3 : active raid1 sdb6[1] sda6[0]
  185151360 blocks [2/2] [UU]

md1 : active raid1 sdb3[1] sda3[0]
  20479936 blocks [2/2] [UU]

unused devices: none
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server offline :-( please help to repair software RAID

2011-04-28 Thread m . roth
Alexander Farber wrote:
 Turned out, smartd kept saying, that it had no entries in smartd.conf.

 I've copied smartd.rpmnew over smartd.conf, restarted it,
 now I have (in /var/log/messages, date+hostname removed):
snip
At this point, I'd run the long test on each drive, and (after coming back
an hour or two later, see the results.

mark yes, it does take that long

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos