Re: failed raid1 drive

2009-10-28 Thread Roger Searle

Craig Falconer wrote:

Then two ways to progress
0Boot in single user mode
1Add one new drive to the machine, partition it  with similar but 
larger partitions as appropriate.

2Then use
mdadm --add /dev/md3 /dev/sdb4
mdadm --add /dev/md2 /dev/sdb3
mdadm --add /dev/md1 /dev/sdb2
mdadm --add /dev/md0 /dev/sdb1
sysctl -w dev.raid.speed_limit_max=
3While this is happening run
watch --int 10 cat /proc/mdstat
Wait until all the drives are synched
4If you boot off this raidset you'll need to reinstall a boot 
loader on each drive

5Down the machine and remove the last 320 GB drive.
6Install the other new drive, then boot.
7Partition the other new drive the same as the first big drive
8Repeat steps 2 and 3 but use sda rather than sdb
Once they're finished synching you can grow your  filesystems to 
their full available space

9Do the boot loader install onto both drives again
10Then you can reboot and it should all be good.

I have a new drive installed, partitioned and formatted, ready to add to 
the raidset, first some questions related to the above, to ease my mind 
before proceeding. 

Is it necessary to boot to single user mode (and why?) since this will 
make the machine unavailable to the network as a file server for the 
duration of the process? Machine is used solely to serve up files.  
Based on the time it took to re-add the drive last week, it would need 
to go offline for some hours, and therefore means a very late (start 
and) finish to a work day or needing to be at a weekend to keep it 
available to users during working days.


From my reading of man mdadm, it suggests doing a fail and remove of 
the faulty drive, possibly at the same time as adding a new device, like:

mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1

Is this a good process to follow or is it redundant/unnecessary?

Just in case I run into issues reinstalling the boot loader from a live 
CD, I understand that I would (as an interim measure) be able to boot 
the machine with a single partition marked as bootable from just the 
current good drive by disconnecting the new drive?


Finally, I'm somewhat unclear how the resulting partitions are going to 
work out, current failing drive is /dev/sdb, /dev/sdc holds backups, new 
larger drive comes up as /dev/sdd. Surely once sdb is physically removed 
sdc and sdd move up a letter and this messes with adding to the raid 
array as sdd?  Or, is a better approach to do a fail  remove of the 
failing drive, physically remove it and put the new drive on the same 
sata connector?


Cheers,
Roger





Re: failed raid1 drive

2009-10-28 Thread Craig Falconer

Roger Searle wrote, On 29/10/09 10:47:

Craig Falconer wrote:

Then two ways to progress
0Boot in single user mode
1Add one new drive to the machine, partition it  with similar but 
larger partitions as appropriate.

2Then use
mdadm --add /dev/md3 /dev/sdb4
mdadm --add /dev/md2 /dev/sdb3
mdadm --add /dev/md1 /dev/sdb2
mdadm --add /dev/md0 /dev/sdb1
sysctl -w dev.raid.speed_limit_max=999
3While this is happening run
watch --int 10 cat /proc/mdstat
Wait until all the drives are synched
4If you boot off this raidset you'll need to reinstall a boot 
loader on each drive

5Down the machine and remove the last 320 GB drive.
6Install the other new drive, then boot.
7Partition the other new drive the same as the first big drive
8Repeat steps 2 and 3 but use sda rather than sdb
Once they're finished synching you can grow your  filesystems to 
their full available space

9Do the boot loader install onto both drives again
10Then you can reboot and it should all be good.

I have a new drive installed, partitioned and formatted, ready to add to 
the raidset, first some questions related to the above, to ease my mind 
before proceeding.
Is it necessary to boot to single user mode (and why?) since this will 
make the machine unavailable to the network as a file server for the 
duration of the process? Machine is used solely to serve up files.  
Based on the time it took to re-add the drive last week, it would need 
to go offline for some hours, and therefore means a very late (start 
and) finish to a work day or needing to be at a weekend to keep it 
available to users during working days.


You're right - single user is not necessary.  The only real reason for 
doing that is so that files aren't changed on your only disk, and then 
some failure before the synch has completed.


BTW I did this last night on a live box and it worked fine.



 From my reading of man mdadm, it suggests doing a fail and remove of 
the faulty drive, possibly at the same time as adding a new device, like:

mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1

Is this a good process to follow or is it redundant/unnecessary?


Sounds silly actually - remove the only good drive as you add the blank 
one?



Just in case I run into issues reinstalling the boot loader from a live 
CD, I understand that I would (as an interim measure) be able to boot 
the machine with a single partition marked as bootable from just the 
current good drive by disconnecting the new drive?


As long as the good drive is bootable it will be fine.  I had an issue 
where the boot loader was only on the second drive of a raid1, but the 
machine was fine until that second drive gave out.  The first drive then 
wasn't bootable.


You will want something like this for grub:

# grub --batch --no-floppy

then type in

root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit




Finally, I'm somewhat unclear how the resulting partitions are going to 
work out, current failing drive is /dev/sdb, /dev/sdc holds backups, new 
larger drive comes up as /dev/sdd. Surely once sdb is physically removed 
sdc and sdd move up a letter and this messes with adding to the raid 
array as sdd?  Or, is a better approach to do a fail  remove of the 
failing drive, physically remove it and put the new drive on the same 
sata connector?


Check your dmesg output for things like
md:  adding sda5 ...
md: sda3 has different UUID to sdb5
md: sda2 has different UUID to sdb5
md: sda1 has different UUID to sdb5
md: created md1


As long as the partition type is FD then the kernel will try to use it 
to assemble a raid device.



--
Craig Falconer


lp0 permission problems

2009-10-28 Thread Barry

I had a problem where cups would not print to lp0. I finally solved it
by changing permissions from 660 to 666.

since running the print job I have rebooted and find that permissions on
lp0 have reverted to 660. Can someone tell me where this is (re)set on
startup and/or how to fix the problem.

System is Mandriva2009.1

TIA
Barry



Re: lp0 permission problems

2009-10-28 Thread Craig Falconer

Barry wrote, On 29/10/09 11:16:

I had a problem where cups would not print to lp0. I finally solved it
by changing permissions from 660 to 666.

since running the print job I have rebooted and find that permissions on
lp0 have reverted to 660. Can someone tell me where this is (re)set on
startup and/or how to fix the problem.



Sounds like udev is doing it wrong.  You could twiddle udev, or add the 
user that cups runs as, to the group which owns /dev/lp0 and then gets 
affected by the group ( middle 6) in the permissions, rather than the 
world permissions.



--
Craig Falconer



Re: failed raid1 drive

2009-10-28 Thread Roger Searle

Roger Searle wrote, On 29/10/09 10:47:
 From my reading of man mdadm, it suggests doing a fail and remove of 
the faulty drive, possibly at the same time as adding a new device, 
like:

mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1

Is this a good process to follow or is it redundant/unnecessary?

Craig Falconer wrote:
Sounds silly actually - remove the only good drive as you add the 
blank one?
Perhaps I have confused things by quoting that line direct from the man 
page rather than changing to reflect my actual devices - it is just 
saying that in one line you can add a new device, the example being sda1 
and removing a failed one that is sdb1.  I'd be adding sdd. does that 
sound better?  The question really being more about the need to fail and 
remove the bad drive?


Re: failed raid1 drive

2009-10-28 Thread Craig Falconer

Roger Searle wrote, On 29/10/09 11:34:

Roger Searle wrote, On 29/10/09 10:47:
 From my reading of man mdadm, it suggests doing a fail and remove of 
the faulty drive, possibly at the same time as adding a new device, 
like:

mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1

Is this a good process to follow or is it redundant/unnecessary?

Craig Falconer wrote:
Sounds silly actually - remove the only good drive as you add the 
blank one?
Perhaps I have confused things by quoting that line direct from the man 
page rather than changing to reflect my actual devices - it is just 
saying that in one line you can add a new device, the example being sda1 
and removing a failed one that is sdb1.  I'd be adding sdd. does that 
sound better?  The question really being more about the need to fail and 
remove the bad drive?


You'll have to power off the box to change the drive anyway, unless you 
are feeling really adventurous and want to hot swap.


I suggest you down the box, swap out the drive, then bring it all back 
up.  The raid will assemble degraded and then you can go from there.






--
Craig Falconer