Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-10-15 Thread Arun Khan
__ SOLVED __

On Sat, Jun 23, 2012 at 10:14 AM, Arun Khan knu...@gmail.com wrote:

 I have not had a thorough look at the dracut script though.


I also posted this problem on the mdadm mailing list but could not get
the problem resolved.

So did some searching on the suspect candidate 'dracut'

After some more searching I found these two bugs reports:
CentOS 6.2 http://bugs.centos.org/view.php?id=5400
CentOS 6.3 http://bugs.centos.org/view.php?id=5970

Using System Rescue CD and mounting the disk image files, I appended
'rdshell' to the kernel line in grub.conf.

With 'rdshell' one can at least do the following to get the system operational.

Booted the system with a disk failure

At the rdshell prompt:

# mdadm --run /dev/md_d0
(replace device name with your device name)

# cat /proc/mdstat
(make sure your raid device is active with one member failure)

# CTRL-D
(exit the rdshell)

The system will boot with md_d0 in degraded mode.

Login in to the system.

# yum update dracut
(dependency dracut-kernel is pulled in)

As of this writing it is dracut-004-284.el6_3.1.noarch

# cd /boot
# dracut  initramfs file name  kernel version

Update grub and reboot.

System boots with when either disk has failed.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-22 Thread Arun Khan
On Thu, Jun 21, 2012 at 11:40 PM, Scott Silva ssi...@sgvwater.com wrote:
 on 6/20/2012 11:34 PM Arun Khan spake the following:
 On Thu, Jun 21, 2012 at 10:09 AM, Rob Kampen

 Just a shot in the dark... DO all the fstab entries call out md devices?


Yes, /etc/fstab contains /dev/md_d0p1 for / partition.

I have been doing some digging in the initramfs and the dracut script.

The initramfs does contain all the md related stuff like drivers, the
devices for md_d0 and the /etc/mdamd.conf.   To the best of my
knowledge these should be sufficient to load /dev/md_d0p1 (/).

I have not had a thorough look at the dracut script though.

I will post whatever relevant information if I find something that I
don't quite understand.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-21 Thread Arun Khan
On Thu, Jun 21, 2012 at 10:09 AM, Rob Kampen
rkam...@reaching-clients.com wrote:
 snip 

 sounds like the mirror is not in synch - when it is running with both
 drives, what does
cat /proc/mdstat

System boots up fully functional with both disks
copy-paste

root@centos62-raid1 ~ 
# cat /proc/mdstat
Personalities : [raid1]
md_d0 : active raid1 sda[0] sdb[1]
  10485696 blocks [2/2] [UU]

unused devices: none

/copy-paste

Both disks are in sync.

Anyways, even if they were out of sync the system should boot with the
disk that is in U state but it does not.

System boots up in rdshell (failed mode) with one of the disks disconnected.

cat /proc/mdstat

# cat /proc/mdstat

Personalities:
md_d0: inactive sda[0] (S)
 10485696 blocks

/cat /proc/mdstat

I do not know the internal workings of dracut but  the problem seems
to be within it (gut feeling).

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-21 Thread Scott Silva
on 6/20/2012 11:34 PM Arun Khan spake the following:
 On Thu, Jun 21, 2012 at 10:09 AM, Rob Kampen
 rkam...@reaching-clients.com wrote:
  snip 
 
 sounds like the mirror is not in synch - when it is running with both
 drives, what does
 cat /proc/mdstat
 
 System boots up fully functional with both disks
 copy-paste
 
 root@centos62-raid1 ~ 
 # cat /proc/mdstat
 Personalities : [raid1]
 md_d0 : active raid1 sda[0] sdb[1]
   10485696 blocks [2/2] [UU]
 
 unused devices: none
 
 /copy-paste
 
 Both disks are in sync.
 
 Anyways, even if they were out of sync the system should boot with the
 disk that is in U state but it does not.
 
 System boots up in rdshell (failed mode) with one of the disks disconnected.
 
 cat /proc/mdstat
 
 # cat /proc/mdstat
 
 Personalities:
 md_d0: inactive sda[0] (S)
  10485696 blocks
 
 /cat /proc/mdstat
 
 I do not know the internal workings of dracut but  the problem seems
 to be within it (gut feeling).
 
 -- Arun Khan
 
Just a shot in the dark... DO all the fstab entries call out md devices?



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-20 Thread Arun Khan
On Wed, Jun 20, 2012 at 10:06 AM, Arun Khan knu...@gmail.com wrote:
 On Wed, Jun 20, 2012 at 1:00 AM,  m.r...@5-cent.us wrote:

  snip 

 For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet,
 so you can actually see what's happening. Sounds to me as though it's
 trying to switch root to a real drive from the virtual drive of the ramfs,
 and it's not working. One thing you *might* also try is before you boot,
 edit the kernel line in grub, and add rdshell at the end, so you boot into
 grub's rudimentary shell if/when it fails, and you can look around and
 find what it's seeing.


 Will try your suggestion and report back.

As mentioned already there are no issues with both disks connected.
In this scenario, I have changed the Partition ID of the
partitionable RAID1 partitions  /dev/md_d0p1 and /dev/md_d0p2 to 'fd'
and then rebooted the system (recall earlier these partitions had
Id=83).

I also made the suggested changes to /boot/grub/grub.conf by Mark

Rebooted the system with both disks connected - system boots fine.
Messages are displayed including the md driver binding /dev/sda and
/dev/sdb. The  root device /dev/md_d0p1 is detected and it is
mounted on / and life is hunky dory.

Reboot the system with disk1 removed, the kernel boots, the 'md'
driver  tries to bind sda.  At this point the systems seems to hang
for a few seconds and then 'dracut' reports that it cannot find
/dev/md_dop1 (the root partition)

 dracut Warning: No root device block:/dev/md_d0p1 found

Console image pasted here http://imagebin.org/217229

In the rdshell environment I can see that /etc/mdadm.conf is defined
but beyond this I don't know what to look for.

Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

Any further suggestions and/or comments?

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-20 Thread m . roth
Arun Khan wrote:
 On Wed, Jun 20, 2012 at 10:06 AM, Arun Khan knu...@gmail.com wrote:
 On Wed, Jun 20, 2012 at 1:00 AM,  m.r...@5-cent.us wrote:

  snip 

 For one thing, edit grub.conf and get *rid* of that idiot rhgb and
 quiet,
snip
 edit the kernel line in grub, and add rdshell at the end, so you boot
 into grub's rudimentary shell if/when it fails, and you can look
around and
 find what it's seeing.

 Will try your suggestion and report back.
nsip
 Reboot the system with disk1 removed, the kernel boots, the 'md'
 driver  tries to bind sda.  At this point the systems seems to hang
 for a few seconds and then 'dracut' reports that it cannot find
 /dev/md_dop1 (the root partition)

  dracut Warning: No root device block:/dev/md_d0p1 found

 Console image pasted here http://imagebin.org/217229

At this point, I'm starting to wonder if the initrd.img has the drivers
for software RAID. You *might* need to rebuild that.

 In the rdshell environment I can see that /etc/mdadm.conf is defined
 but beyond this I don't know what to look for.

 Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

 Any further suggestions and/or comments?

What devices are there in /dev/? /dev/sd? /dev/md?

 mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-20 Thread Arun Khan
On Wed, Jun 20, 2012 at 10:57 PM,  m.r...@5-cent.us wrote:
 Arun Khan wrote:

 Reboot the system with disk1 removed, the kernel boots, the 'md'
 driver  tries to bind sda.  At this point the systems seems to hang
 for a few seconds and then 'dracut' reports that it cannot find
 /dev/md_dop1 (the root partition)

          dracut Warning: No root device block:/dev/md_d0p1 found

 Console image pasted here http://imagebin.org/217229

 At this point, I'm starting to wonder if the initrd.img has the drivers
 for software RAID. You *might* need to rebuild that.

Using 'dracut'  I did create a new initramfs file per the instruction
in the wiki.

Nonetheless, assuming that the md module is missing in the new
initramfs, one would expect the boot to fail with /dev/sda and
/dev/sdb both connected to the system. The fact the system boots in
this case shows that the md driver is present.

See screenshot here http://imagebin.org/217246


 In the rdshell environment I can see that /etc/mdadm.conf is defined
 but beyond this I don't know what to look for.

 Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

 Any further suggestions and/or comments?

 What devices are there in /dev/? /dev/sd? /dev/md?

/dev/md_d0
/dev/md/md-device-map

Please see screenshot http://imagebin.org/217263

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-20 Thread Rob Kampen

On 06/21/2012 04:11 AM, Arun Khan wrote:

On Wed, Jun 20, 2012 at 10:06 AM, Arun Khanknu...@gmail.com  wrote:

On Wed, Jun 20, 2012 at 1:00 AM,m.r...@5-cent.us  wrote:

 snip 


For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet,
so you can actually see what's happening. Sounds to me as though it's
trying to switch root to a real drive from the virtual drive of the ramfs,
and it's not working. One thing you *might* also try is before you boot,
edit the kernel line in grub, and add rdshell at the end, so you boot into
grub's rudimentary shell if/when it fails, and you can look around and
find what it's seeing.


Will try your suggestion and report back.

As mentioned already there are no issues with both disks connected.
In this scenario, I have changed the Partition ID of the
partitionable RAID1 partitions  /dev/md_d0p1 and /dev/md_d0p2 to 'fd'
and then rebooted the system (recall earlier these partitions had
Id=83).

I also made the suggested changes to /boot/grub/grub.conf by Mark

Rebooted the system with both disks connected - system boots fine.
Messages are displayed including the md driver binding /dev/sda and
/dev/sdb. The  root device /dev/md_d0p1 is detected and it is
mounted on / and life is hunky dory.

Reboot the system with disk1 removed, the kernel boots, the 'md'
driver  tries to bind sda.  At this point the systems seems to hang
for a few seconds and then 'dracut' reports that it cannot find
/dev/md_dop1 (the root partition)

  dracut Warning: No root device block:/dev/md_d0p1 found
sounds like the mirror is not in synch - when it is running with both 
drives, what does

cat /proc/mdstat
show??


Console image pasted herehttp://imagebin.org/217229

In the rdshell environment I can see that /etc/mdadm.conf is defined
but beyond this I don't know what to look for.

Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

Any further suggestions and/or comments?

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread m . roth
Arun Khan wrote:
snip
 Following the instructions on CentOS Wiki
 http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1  I
 installed a min. server in Linux KVM setup (script shown below)
snip
 The system boots fine when both disks are available.
 When I remove either of the disks (delete the -drive file= line), the
 system boots to a point wherein the GRUB menu is displayed and the
 progress bar displays for a while till the white bar reaches about
 halfway point and then it:

 Kernel panic - not syncing: Attempted to kill init!
snip
 fdisk -l
 root@centos62-raid1 ~ 
 # fdisk -l

 Disk /dev/sda: 10.7 GB, 10737418240 bytes
 255 heads, 63 sectors/track, 1305 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 Disk identifier: 0x000e8353

Device Boot  Start End  Blocks   Id  System
 /dev/sda1   *   1 523 4194304   83  Linux
 Partition 1 does not end on cylinder boundary.
 /dev/sda2 5231045 4194304   83  Linux
 /dev/sda310451176 1048576   82  Linux swap /
 Solaris
snip
Ok, I see that it's hardware 512b blocks, so you're not running into
issues with 4k hardware blocks. I trust you installed grub on /dev/md0,
which I assume is /dev/sda1 and /dev/sdb1?

  mark


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread Arun Khan
On Wed, Jun 20, 2012 at 12:11 AM,  m.r...@5-cent.us wrote:
 Arun Khan wrote:
    Device Boot      Start         End      Blocks   Id  System
 /dev/sda1   *           1         523     4194304   83  Linux
 Partition 1 does not end on cylinder boundary.
 /dev/sda2             523        1045     4194304   83  Linux
 /dev/sda3            1045        1176     1048576   82  Linux swap /
 Solaris
 snip
 Ok, I see that it's hardware 512b blocks, so you're not running into
 issues with 4k hardware blocks. I trust you installed grub on /dev/md0,
 which I assume is /dev/sda1 and /dev/sdb1?


From the wiki instructions, there is no re-installation of GRUB, only
a couple of changes in /boot/grub/grub.conf file installed by the
regular installation on /dev/sda.   During the RAID1 creation process
the  grub from /dev/sda would mirrored into the RAID1 device and
appear on the MBR of both the disks.

As I said in the OP, I do see the grub menu with either of the disks
unplugged  i.e. missing.   The kernel does boot and the white
progress bar goes upto about 50% when the kernel panic occurs.  I will
turn off the splash and see what comes up on the console.   Gut
feeling --  I suspect the problem is with the initrd image created
with the dracut

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread m . roth
Arun Khan wrote:
 On Wed, Jun 20, 2012 at 12:11 AM,  m.r...@5-cent.us wrote:
 Arun Khan wrote:
    Device Boot      Start         End      Blocks   Id  System
 /dev/sda1   *           1         523     4194304   83  Linux
 Partition 1 does not end on cylinder boundary.
 /dev/sda2             523        1045     4194304   83  Linux
 /dev/sda3            1045        1176     1048576   82  Linux swap /
 Solaris
 snip
 Ok, I see that it's hardware 512b blocks, so you're not running into
 issues with 4k hardware blocks. I trust you installed grub on /dev/md0,
 which I assume is /dev/sda1 and /dev/sdb1?

From the wiki instructions, there is no re-installation of GRUB, only
 a couple of changes in /boot/grub/grub.conf file installed by the
 regular installation on /dev/sda.   During the RAID1 creation process
 the  grub from /dev/sda would mirrored into the RAID1 device and
 appear on the MBR of both the disks.

 As I said in the OP, I do see the grub menu with either of the disks
 unplugged  i.e. missing.   The kernel does boot and the white
 progress bar goes upto about 50% when the kernel panic occurs.  I will
 turn off the splash and see what comes up on the console.   Gut
 feeling --  I suspect the problem is with the initrd image created
 with the dracut

For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet,
so you can actually see what's happening. Sounds to me as though it's
trying to switch root to a real drive from the virtual drive of the ramfs,
and it's not working. One thing you *might* also try is before you boot,
edit the kernel line in grub, and add rdshell at the end, so you boot into
grub's rudimentary shell if/when it fails, and you can look around and
find what it's seeing.

   mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread Rob Kampen

On 06/20/2012 07:23 AM, Arun Khan wrote:

On Wed, Jun 20, 2012 at 12:11 AM,m.r...@5-cent.us  wrote:

Arun Khan wrote:

Device Boot  Start End  Blocks   Id  System
/dev/sda1   *   1 523 4194304   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 5231045 4194304   83  Linux
/dev/sda310451176 1048576   82  Linux swap /
Solaris

raid needs Id of fd rather than 83 to auto detect??

snip
Ok, I see that it's hardware 512b blocks, so you're not running into
issues with 4k hardware blocks. I trust you installed grub on /dev/md0,
which I assume is /dev/sda1 and /dev/sdb1?


 From the wiki instructions, there is no re-installation of GRUB, only
a couple of changes in /boot/grub/grub.conf file installed by the
regular installation on /dev/sda.   During the RAID1 creation process
the  grub from /dev/sda would mirrored into the RAID1 device and
appear on the MBR of both the disks.

As I said in the OP, I do see the grub menu with either of the disks
unplugged  i.e. missing.   The kernel does boot and the white
progress bar goes upto about 50% when the kernel panic occurs.  I will
turn off the splash and see what comes up on the console.   Gut
feeling --  I suspect the problem is with the initrd image created
with the dracut

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread m . roth
Rob Kampen wrote:
 On 06/20/2012 07:23 AM, Arun Khan wrote:
 On Wed, Jun 20, 2012 at 12:11 AM,m.r...@5-cent.us  wrote:
 Arun Khan wrote:
 Device Boot  Start End  Blocks   Id  System
 /dev/sda1   *   1 523 4194304   83  Linux
 Partition 1 does not end on cylinder boundary.
 /dev/sda2 5231045 4194304   83  Linux
 /dev/sda310451176 1048576   82  Linux swap /
 Solaris
 raid needs Id of fd rather than 83 to auto detect??

Good catch. A quick google got me a page on filesystem types, which had
this line:
fd Linux raid partition with autodetect using persistent superblock

 snip

   mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread Arun Khan
On Wed, Jun 20, 2012 at 2:18 AM,  m.r...@5-cent.us wrote:
 Rob Kampen wrote:
 On 06/20/2012 07:23 AM, Arun Khan wrote:
 On Wed, Jun 20, 2012 at 12:11 AM,m.r...@5-cent.us  wrote:
 Arun Khan wrote:
     Device Boot      Start         End      Blocks   Id  System
 /dev/sda1   *           1         523     4194304   83  Linux
 Partition 1 does not end on cylinder boundary.
 /dev/sda2             523        1045     4194304   83  Linux
 /dev/sda3            1045        1176     1048576   82  Linux swap /
 Solaris
 raid needs Id of fd rather than 83 to auto detect??

 Good catch. A quick google got me a page on filesystem types, which had
 this line:
 fd Linux raid partition with autodetect using persistent superblock


But this is supposed to be RAID1 on the *entire* disks and not on the
individual partitions.

The instruction on the wiki clearly states do a regular install on
the first disk (I did leave a few blocks at the end of the first disk
as per the instructions) and then create a partitionable RAID1
md_d0.

http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1

wiki quote

..

Why would you want to have a system installed on a partitionable software RAID1?

If you are installing a system on a partitionable RAID you can use the
whole hard drive as a RAID component device, and since RAID1 is a
mirror, you will be able to boot your system from any of the drives in
case of failure without any additional tricks required to preserve
bootloader configuration, etc. And when you need to repair a failed
RAID volume with the whole hard drive as a RAID component, all you
have to do is to insert a new hard drive and run mdadm --add; no
partitioning or anything else required.

...

Steps for both CentOS 5  6

1. Install CentOS using standard installer on the first hard disk,
/dev/sda. Select manual partitioning during the installation, and
leave at least 1 unit at the very end of the disk unpartitioned. You
will be able to redeem most of this space back later. You need to
reserve this space for mdadm which stores it's metadata at the last
chunk of a raid volume.

2. Boot from the CentOS installation disk in the Rescue mode. The
installer will ask you if you wish to mount an existing CentOS
installation, you must refuse.

3. Build the software RAID1 using mdadm in degraded mode, with
/dev/sda as the only drive:
mdadm --create --metadata=0.90 --level=1 --raid-devices=2 /dev/md_d0
/dev/sda missing

4. Add the mirror drive /dev/sdb into the raid and check /proc/mdstat
to see that the raid started building:
mdadm --add /dev/md_d0 /dev/sdb
cat /proc/mdstat
...

/wiki quote

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

2012-06-19 Thread Arun Khan
On Wed, Jun 20, 2012 at 1:00 AM,  m.r...@5-cent.us wrote:

 snip 

 For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet,
 so you can actually see what's happening. Sounds to me as though it's
 trying to switch root to a real drive from the virtual drive of the ramfs,
 and it's not working. One thing you *might* also try is before you boot,
 edit the kernel line in grub, and add rdshell at the end, so you boot into
 grub's rudimentary shell if/when it fails, and you can look around and
 find what it's seeing.


Will try your suggestion and report back.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos