Re: failed raid1 drive
Craig Falconer wrote: Then two ways to progress 0Boot in single user mode 1Add one new drive to the machine, partition it with similar but larger partitions as appropriate. 2Then use mdadm --add /dev/md3 /dev/sdb4 mdadm --add /dev/md2 /dev/sdb3 mdadm --add /dev/md1 /dev/sdb2 mdadm --add /dev/md0 /dev/sdb1 sysctl -w dev.raid.speed_limit_max= 3While this is happening run watch --int 10 cat /proc/mdstat Wait until all the drives are synched 4If you boot off this raidset you'll need to reinstall a boot loader on each drive 5Down the machine and remove the last 320 GB drive. 6Install the other new drive, then boot. 7Partition the other new drive the same as the first big drive 8Repeat steps 2 and 3 but use sda rather than sdb Once they're finished synching you can grow your filesystems to their full available space 9Do the boot loader install onto both drives again 10Then you can reboot and it should all be good. I have a new drive installed, partitioned and formatted, ready to add to the raidset, first some questions related to the above, to ease my mind before proceeding. Is it necessary to boot to single user mode (and why?) since this will make the machine unavailable to the network as a file server for the duration of the process? Machine is used solely to serve up files. Based on the time it took to re-add the drive last week, it would need to go offline for some hours, and therefore means a very late (start and) finish to a work day or needing to be at a weekend to keep it available to users during working days. From my reading of man mdadm, it suggests doing a fail and remove of the faulty drive, possibly at the same time as adding a new device, like: mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1 Is this a good process to follow or is it redundant/unnecessary? Just in case I run into issues reinstalling the boot loader from a live CD, I understand that I would (as an interim measure) be able to boot the machine with a single partition marked as bootable from just the current good drive by disconnecting the new drive? Finally, I'm somewhat unclear how the resulting partitions are going to work out, current failing drive is /dev/sdb, /dev/sdc holds backups, new larger drive comes up as /dev/sdd. Surely once sdb is physically removed sdc and sdd move up a letter and this messes with adding to the raid array as sdd? Or, is a better approach to do a fail remove of the failing drive, physically remove it and put the new drive on the same sata connector? Cheers, Roger
Re: failed raid1 drive
Roger Searle wrote, On 29/10/09 10:47: Craig Falconer wrote: Then two ways to progress 0Boot in single user mode 1Add one new drive to the machine, partition it with similar but larger partitions as appropriate. 2Then use mdadm --add /dev/md3 /dev/sdb4 mdadm --add /dev/md2 /dev/sdb3 mdadm --add /dev/md1 /dev/sdb2 mdadm --add /dev/md0 /dev/sdb1 sysctl -w dev.raid.speed_limit_max=999 3While this is happening run watch --int 10 cat /proc/mdstat Wait until all the drives are synched 4If you boot off this raidset you'll need to reinstall a boot loader on each drive 5Down the machine and remove the last 320 GB drive. 6Install the other new drive, then boot. 7Partition the other new drive the same as the first big drive 8Repeat steps 2 and 3 but use sda rather than sdb Once they're finished synching you can grow your filesystems to their full available space 9Do the boot loader install onto both drives again 10Then you can reboot and it should all be good. I have a new drive installed, partitioned and formatted, ready to add to the raidset, first some questions related to the above, to ease my mind before proceeding. Is it necessary to boot to single user mode (and why?) since this will make the machine unavailable to the network as a file server for the duration of the process? Machine is used solely to serve up files. Based on the time it took to re-add the drive last week, it would need to go offline for some hours, and therefore means a very late (start and) finish to a work day or needing to be at a weekend to keep it available to users during working days. You're right - single user is not necessary. The only real reason for doing that is so that files aren't changed on your only disk, and then some failure before the synch has completed. BTW I did this last night on a live box and it worked fine. From my reading of man mdadm, it suggests doing a fail and remove of the faulty drive, possibly at the same time as adding a new device, like: mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1 Is this a good process to follow or is it redundant/unnecessary? Sounds silly actually - remove the only good drive as you add the blank one? Just in case I run into issues reinstalling the boot loader from a live CD, I understand that I would (as an interim measure) be able to boot the machine with a single partition marked as bootable from just the current good drive by disconnecting the new drive? As long as the good drive is bootable it will be fine. I had an issue where the boot loader was only on the second drive of a raid1, but the machine was fine until that second drive gave out. The first drive then wasn't bootable. You will want something like this for grub: # grub --batch --no-floppy then type in root (hd0,0) setup (hd0) root (hd1,0) setup (hd1) quit Finally, I'm somewhat unclear how the resulting partitions are going to work out, current failing drive is /dev/sdb, /dev/sdc holds backups, new larger drive comes up as /dev/sdd. Surely once sdb is physically removed sdc and sdd move up a letter and this messes with adding to the raid array as sdd? Or, is a better approach to do a fail remove of the failing drive, physically remove it and put the new drive on the same sata connector? Check your dmesg output for things like md: adding sda5 ... md: sda3 has different UUID to sdb5 md: sda2 has different UUID to sdb5 md: sda1 has different UUID to sdb5 md: created md1 As long as the partition type is FD then the kernel will try to use it to assemble a raid device. -- Craig Falconer
lp0 permission problems
I had a problem where cups would not print to lp0. I finally solved it by changing permissions from 660 to 666. since running the print job I have rebooted and find that permissions on lp0 have reverted to 660. Can someone tell me where this is (re)set on startup and/or how to fix the problem. System is Mandriva2009.1 TIA Barry
Re: lp0 permission problems
Barry wrote, On 29/10/09 11:16: I had a problem where cups would not print to lp0. I finally solved it by changing permissions from 660 to 666. since running the print job I have rebooted and find that permissions on lp0 have reverted to 660. Can someone tell me where this is (re)set on startup and/or how to fix the problem. Sounds like udev is doing it wrong. You could twiddle udev, or add the user that cups runs as, to the group which owns /dev/lp0 and then gets affected by the group ( middle 6) in the permissions, rather than the world permissions. -- Craig Falconer
Re: failed raid1 drive
Roger Searle wrote, On 29/10/09 10:47: From my reading of man mdadm, it suggests doing a fail and remove of the faulty drive, possibly at the same time as adding a new device, like: mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1 Is this a good process to follow or is it redundant/unnecessary? Craig Falconer wrote: Sounds silly actually - remove the only good drive as you add the blank one? Perhaps I have confused things by quoting that line direct from the man page rather than changing to reflect my actual devices - it is just saying that in one line you can add a new device, the example being sda1 and removing a failed one that is sdb1. I'd be adding sdd. does that sound better? The question really being more about the need to fail and remove the bad drive?
Re: failed raid1 drive
Roger Searle wrote, On 29/10/09 11:34: Roger Searle wrote, On 29/10/09 10:47: From my reading of man mdadm, it suggests doing a fail and remove of the faulty drive, possibly at the same time as adding a new device, like: mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1 Is this a good process to follow or is it redundant/unnecessary? Craig Falconer wrote: Sounds silly actually - remove the only good drive as you add the blank one? Perhaps I have confused things by quoting that line direct from the man page rather than changing to reflect my actual devices - it is just saying that in one line you can add a new device, the example being sda1 and removing a failed one that is sdb1. I'd be adding sdd. does that sound better? The question really being more about the need to fail and remove the bad drive? You'll have to power off the box to change the drive anyway, unless you are feeling really adventurous and want to hot swap. I suggest you down the box, swap out the drive, then bring it all back up. The raid will assemble degraded and then you can go from there. -- Craig Falconer