Re: raid1 or raid10 for /boot

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:

 I understand that lilo and grub only can boot partitions that look like
 a normal single-drive partition. And then I understand that a plain
 raid10 has a layout which is equivalent to raid1. Can such a raid10
 partition be used with grub or lilo for booting?
 And would there be any advantages in this, for example better disk
 utilization in the raid10 driver compared with raid?
 
A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
_cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
other boot manager currently out there).

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgp5z4BPWeOcx.pgp
Description: PGP signature


Re: raid1 or raid10 for /boot

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 12:21:40PM +0100, Keld Jørn Simonsen wrote:

 On Mon, Feb 04, 2008 at 09:17:35AM +, Robin Hill wrote:
  On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:
  
   I understand that lilo and grub only can boot partitions that look like
   a normal single-drive partition. And then I understand that a plain
   raid10 has a layout which is equivalent to raid1. Can such a raid10
   partition be used with grub or lilo for booting?
   And would there be any advantages in this, for example better disk
   utilization in the raid10 driver compared with raid?
   
  A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
  _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
  I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
  mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
  other boot manager currently out there).
 
 Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2,
 far=1 does not use striping, anfd this is what you get if you just make
 amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1
 
Well yes, if you do a two-disk RAID-10 then (as I said above) you
probably end up with a RAID-1 (as you do with a two-disk RAID-5).  I
don't see how this would work any differently (or better) than a RAID-1
though (and only serves to confuse things).

If you have more than two disks then RAID-10 will _always_ stripe (no
matter whether you use near, far or offset layout - these affect only
where the mirrored chunks are put) and grub/lilo will fail to work.

Cheers,
Robin

-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpJkvfTSVpOK.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 05:06:09AM -0600, Moshe Yudkowsky wrote:

 Robin, thanks for the explanation. I have a further question.

 Robin Hill wrote:

 Once the file system is mounted then hdX,Y maps according to the
 device.map file (which may actually bear no resemblance to the drive
 order at boot - I've had issues with this before).  At boot time it maps
 to the BIOS boot order though, and (in my experience anyway) hd0 will
 always map to the drive the BIOS is booting from.

 At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. 
 Therefore, I don't quite understand why this would not work:

 grub EOF
 root(hd2,1)
 setup(hd2)
 EOF

 This would seem to be a command to have the MBR on hd2 written to use the 
 boot on hd2,1. It's valid when written. Are you saying that it's a command 
 for the MBR on /dev/sdc to find the data on (hd2,1), the location of which 
 might change at any time? That's... a  very strange way to write the tool. 
 I thought it would be a command for the MBR on hd2 (sdc) to look at hd2,1 
 (sdc1) to find its data, regardless of the boot order that caused sdc to be 
 the boot disk.

This is exactly what it does, yes - the hdX,Y are mapped by GRUB into
BIOS disk interfaces (0x80 being the first, 0x81 the second and so on)
and it writes (to hdc in this case) the instructions to look on the
first partition of BIOS drive 0x82 (whichever drive that ends up being)
for the rest of the bootloader.

It is a bit of a strange way to work, but it's really the only way it
_can_ work (and cover all circumstances).  Unfortunately when you start
playing with bootloaders you have to get down to the BIOS level, and
things weren't written to make sense at that level (after all, when
these standards were put in place everyone was booting from a single
floppy disk system).  If EFI becomes more standard then hopefully this
will simplify but we're stuck with things as they are for now.

Cheers,
Robin

-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgplpoLJetl8c.pgp
Description: PGP signature


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:

 I've managed to get myself into a little problem.

 Since power hits were taking out the /boot partition, I decided to split 
 /boot out of root. Working from my emergency partition,  I copied all files 
 from /root, re-partitioned what had been /root into room for /boot and 
 /root, and then created the drives. This left me with /dev/md/boot, 
 /dev/md/root, and /dev/md/base (everything else).

 I modified mdadm.conf on the emergency partition, used update-initramfs to 
 make certain that the new md drives would be recognized, and rebooted. This 
 worked as expected.

 I then mounted all the entire new file system on a mount point, copied the 
 mdadm.conf to that point, did a chroot to that point, and did an 
 update-initramfs so that the non-emergency partition would have the updated 
 mdadm.conf. This worked -- but with complaints about missing the file 
 /proc/modules (which is not present under chroot). If I use the -v option I 
 can see the raid456, raid1, etc. modules loading.

 I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
 (thanks, Robin!) successfully.

 Problem: on reboot, the I get an error message:

 root (hd0,1)  (Moshe comment: as expected)
 Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
 kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro
---^^

Are you sure that's right?  Looks like a typo to me.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpLybDJ7EGiw.pgp
Description: PGP signature


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:

 I've managed to get myself into a little problem.

 Since power hits were taking out the /boot partition, I decided to split 
 /boot out of root. Working from my emergency partition,  I copied all files 
 from /root, re-partitioned what had been /root into room for /boot and 
 /root, and then created the drives. This left me with /dev/md/boot, 
 /dev/md/root, and /dev/md/base (everything else).

 I modified mdadm.conf on the emergency partition, used update-initramfs to 
 make certain that the new md drives would be recognized, and rebooted. This 
 worked as expected.

 I then mounted all the entire new file system on a mount point, copied the 
 mdadm.conf to that point, did a chroot to that point, and did an 
 update-initramfs so that the non-emergency partition would have the updated 
 mdadm.conf. This worked -- but with complaints about missing the file 
 /proc/modules (which is not present under chroot). If I use the -v option I 
 can see the raid456, raid1, etc. modules loading.

 I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
 (thanks, Robin!) successfully.

 Problem: on reboot, the I get an error message:

 root (hd0,1)  (Moshe comment: as expected)
 Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
 kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro

 Error 15: File not found

File not found at that point would suggest it can't find the kernel
file.  The path here should be relative to the root of the partition
/boot is on, so if your /boot is its own partition then you should
either use kernel /vmlinuz or (the more usual solution from what
I've seen) make sure there's a symlink:
ln -s . /boot/boot

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpSsIYkFb4DG.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Robin Hill
On Sun Feb 03, 2008 at 01:15:10PM -0600, Moshe Yudkowsky wrote:

 I've been reading the draft and checking it against my experience. Because 
 of local power fluctuations, I've just accidentally checked my system:  My 
 system does *not* survive a power hit. This has happened twice already 
 today.

 I've got /boot and a few other pieces in a 4-disk RAID 1 (three running, 
 one spare). This partition is on /dev/sd[abcd]1.

 I've used grub to install grub on all three running disks:

 grub --no-floppy EOF
 root (hd0,1)
 setup (hd0)
 root (hd1,1)
 setup (hd1)
 root (hd2,1)
 setup (hd2)
 EOF

 (To those reading this thread to find out how to recover: According to 
 grub's map option, /dev/sda1 maps to hd0,1.)

This is wrong - the disk you boot from will always be hd0 (no matter
what the map file says - that's only used after the system's booted).
You need to remap the hd0 device for each disk:

grub --no-floppy EOF
root (hd0,1)
setup (hd0)
device (hd0) /dev/sdb
root (hd0,1)
setup (hd0)
device (hd0) /dev/sdc
root (hd0,1)
setup (hd0)
device (hd0) /dev/sdd
root (hd0,1)
setup (hd0)
EOF


 After the power hit, I get:

  Error 16
  Inconsistent filesystem mounted

 I then tried to boot up on hda1,1, hdd2,1 -- none of them worked.

 The culprit, in my opinion, is the reiserfs file system. During the power 
 hit, the reiserfs file system of /boot was left in an inconsistent state; 
 this meant I had up to three bad copies of /boot.

Could well be - I always use ext2 for the /boot filesystem and don't
have it automounted.  I only mount the partition to install a new
kernel, then unmount it again.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgp9vtpHG44V7.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Robin Hill
On Sun Feb 03, 2008 at 02:46:54PM -0600, Moshe Yudkowsky wrote:

 Robin Hill wrote:

 This is wrong - the disk you boot from will always be hd0 (no matter
 what the map file says - that's only used after the system's booted).
 You need to remap the hd0 device for each disk:
 grub --no-floppy EOF
 root (hd0,1)
 setup (hd0)
 device (hd0) /dev/sdb
 root (hd0,1)
 setup (hd0)
 device (hd0) /dev/sdc
 root (hd0,1)
 setup (hd0)
 device (hd0) /dev/sdd
 root (hd0,1)
 setup (hd0)
 EOF

 For my enlightenment: if the file system is mounted, then hd2,1 is a 
 sensible grub operation, isn't it? For the record, given my original script 
 when I boot I am able to edit the grub boot options to read

 root (hd2,1)

 and proceed to boot.

Once the file system is mounted then hdX,Y maps according to the
device.map file (which may actually bear no resemblance to the drive
order at boot - I've had issues with this before).  At boot time it maps
to the BIOS boot order though, and (in my experience anyway) hd0 will
always map to the drive the BIOS is booting from.

So initially you may have:
SATA-1: hd0
SATA-2: hd1
SATA-3: hd2

Now, if the SATA-1 drive dies totally you will have:
SATA-1: -
SATA-2: hd0
SATA-3: hd1

or if SATA-2 dies:
SATA-1: hd0
SATA-2: -
SATA-3: hd1

Note that in the case where the drive is still detected but fails to
boot then the behaviour seems to be very BIOS dependent - some will
continue to drive 2 as above, whereas others will just sit and complain.

So to answer the second part of your question, yes - at boot time
currently you can do root (hd2,1) or root (hd3,1).  If a disk dies,
however (whichever disk it is), then root (hd3,1) will fail to work.

Note also that the above is only my experience - if you're depending on
certain behaviour under these circumstances then you really need to test
it out on your hardware by disconnecting drives, substituting
non-bootable drives, etc.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpuoxEErijXz.pgp
Description: PGP signature


Re: Error mounting a reiserfs on renamed raid1

2008-01-25 Thread Robin Hill
On Fri Jan 25, 2008 at 01:48:32AM +0100, Clemens Koller wrote:

 Hi there.

 I am new to this list, however didn't find this effect nor a
 solution to my problem in the archives or with google:

 short story:
 
 A single raid1 as /dev/md0 containing a reiserfs (with important data)
 assembled during boot works just fine:
 $ cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1]
 md0 : active raid1 hdg1[1] hde1[0]
   293049600 blocks [2/2] [UU]

 The same raid1 moved to another machine as a fourth raid can be
 assembled manually as /dev/md3 (to work around naming conflicts),
 but it cannot be mounted anymore:
 $ mdadm --assemble /dev/md3 --update=super-minor -m0 /dev/hde /dev/hdg
 does not complain. /dev/md3 is created. But

It looks like you should be assembling the partitions, not the disks.
Certainly the mdstat entry above shows the array being formed from the
disks.  Try:
  mdadm --assemble /dev/md3 --update=super-minor -m0 /dev/hde1 /dev/hdg1

 $ mount /dev/md3 /raidmd3 gives:

 Jan 24 20:24:10 rio kernel: md: md3 stopped.
 Jan 24 20:24:10 rio kernel: md: bindhdg
 Jan 24 20:24:10 rio kernel: md: bindhde
 Jan 24 20:24:10 rio kernel: raid1: raid set md3 active with 2 out of 2 
 mirrors
 Jan 24 20:24:12 rio kernel: ReiserFS: md3: warning: sh-2021: 
 reiserfs_fill_super: can not find reiserfs on md3

 Adding  -t reiserfs doesn't work either.

Presumably the superblock for the file system cannot be found because
it's now offset due to the above issue.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpku2hConceS.pgp
Description: PGP signature


Re: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Robin Hill
On Sat Jan 19, 2008 at 11:08:43PM -, Steve Fairbairn wrote:

 
 Hi All,
 
 I have a Software RAID 5 device configured, but one of the drives
 failed. I removed the drive with the following command...
 
 mdadm /dev/md0 --remove /dev/hdc1
 
 Now, when I try to insert the replacement drive back in, I get the
 following...
 
 [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1
 mdadm: add new device failed for /dev/hdc1 as 5: Invalid argument
 
 [EMAIL PROTECTED] mdadm-2.6.4]# dmesg | tail
 ...
 md: hdc1 has invalid sb, not importing!
 md: md_import_device returned -22
 md: hdc1 has invalid sb, not importing!
 md: md_import_device returned -22
 
I've had the same error message trying to add a drive into an array
myself - in my case I'm almost certain it's because the drive is
slightly smaller than the others in the array (the array's currently
growing so I haven't delved any further yet).  Have you checked the
actual partition sizes?  Particularly if it's a different type of drive
as drives from different manufacturers can vary by quite a large
amount.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpKJVYKhAk6m.pgp
Description: PGP signature


Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?

2007-12-19 Thread Robin Hill
On Wed Dec 19, 2007 at 09:50:16AM -0500, Justin Piszcz wrote:

 The (up to) 30% percent figure is mentioned here:
 http://insights.oetiker.ch/linux/raidoptimization.html

That looks to be referring to partitioning a RAID device - this'll only
apply to hardware RAID or partitionable software RAID, not to the normal
use case.  When you're creating an array out of standard partitions then
you know the array stripe size will align with the disks (there's no way
it cannot), and you can set the filesystem stripe size to align as well
(XFS will do this automatically).

I've actually done tests on this with hardware RAID to try to find the
correct partition offset, but wasn't able to see any difference (using
bonnie++ and moving the partition start by one sector at a time).

 # fdisk -l /dev/sdc

 Disk /dev/sdc: 150.0 GB, 150039945216 bytes
 255 heads, 63 sectors/track, 18241 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Disk identifier: 0x5667c24a

Device Boot  Start End  Blocks   Id  System
 /dev/sdc1   1   18241   146520801   fd  Linux raid 
 autodetect

This looks to be a normal disk - the partition offsets shouldn't be
relevant here (barring any knowledge of the actual physical disk layout
anyway, and block remapping may well make that rather irrelevant).

That's my take on this one anyway.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpF38P14XDRA.pgp
Description: PGP signature


Re: Raid over 48 disks

2007-12-18 Thread Robin Hill
On Tue Dec 18, 2007 at 12:29:27PM -0500, Norman Elton wrote:

 We're investigating the possibility of running Linux (RHEL) on top of Sun's 
 X4500 Thumper box:

 http://www.sun.com/servers/x64/x4500/

 Basically, it's a server with 48 SATA hard drives. No hardware RAID. It's 
 designed for Sun's ZFS filesystem.

 So... we're curious how Linux will handle such a beast. Has anyone run MD 
 software RAID over so many disks? Then piled LVM/ext3 on top of that? Any 
 suggestions?

 Are we crazy to think this is even possible?

The most I've done is 28 drives in RAID-10 (SCSI drives, with the array
formatted as XFS).  That keeps failing one drive, but I've not had time
to give the drive a full test yet to confirm it's a drive issue.  It's
been running quite happily (under pretty heavy database load) on 27
disks for a couple of months now though.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpFz4s5k2eD3.pgp
Description: PGP signature


Re: Fwd: issues rebuilding raid array.

2007-10-22 Thread Robin Hill
On Mon Oct 22, 2007 at 09:46:08PM +1000, Sam Redfern wrote:

 Greetings happy mdadm users.
 
 I have a little problem that after many hours of searching around I
 couldn't seem to solve.
 
 I have upgraded my motherboard and kernel (bad practice I know but the
 ICH9R controller needs  2.6.2*+) at the same time.
 
 The array was build using 2.6.18-7 Now i'm using  2.6.21-2
 
 I'm trying to recreate the raid array with the following command and
 this is the error I get:
 
 mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde
 /dev/sdf /dev/sdg
 mdadm: looking for devices for /dev/md1
 mdadm: no RAID superblock on /dev/sdc
 mdadm: /dev/sdc has no superblock - assembly aborted
 
You're trying to assemble the array from 6 disks here and one looks to
be dodgy.  That's okay so far.

 So I figure, oh look the disk sdc has gone cactus, I'll just remove it
 from the list. One of the advantages of mdadm.
 
 mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg
 mdadm: looking for devices for /dev/md1
 mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1.
 mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0.
 mdadm: /dev/sde is identified as a member of /dev/md1, slot 1.
 mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5.
 mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4.
 mdadm: added /dev/sde to /dev/md1 as 1
 mdadm: no uptodate device for slot 2 of /dev/md1
 mdadm: no uptodate device for slot 3 of /dev/md1
 mdadm: added /dev/sdg to /dev/md1 as 4
 mdadm: added /dev/sdf to /dev/md1 as 5
 mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument
 mdadm: added /dev/sdd to /dev/md1 as 0
 mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.
 
Now you're trying to assemble with 5 disks and getting 4 out of 6 in the
array, and one at slot -1 (i.e. a spare).

 If found this really difficult to understand considering that I can
 get the output of mdamd -E /dev/sdb (other disks included to overload
 you with information)
 
 mdadm -E /dev/sd[b-h]
 
 /dev/sdb:
   Magic : a92b4efc
 Version : 00.90.00
UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
   Creation Time : Fri Oct  5 09:18:25 2007
  Raid Level : raid5
 Device Size : 312571136 (298.09 GiB 320.07 GB)
  Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
Raid Devices : 6
   Total Devices : 6
 Preferred Minor : 1
 
 Update Time : Tue Oct 16 20:03:13 2007
   State : clean
  Active Devices : 6
 Working Devices : 6
  Failed Devices : 0
   Spare Devices : 0
Checksum : 80d47486 - correct
  Events : 0.623738
 
  Layout : left-symmetric
  Chunk Size : 64K
 
   Number   Major   Minor   RaidDevice State
 this 6   8   16   -1  spare   /dev/sdb
 
0 0   8   800  active sync   /dev/sdf
1 1   8  1281  active sync   /dev/.static/dev/sdi
2 2   8  1442  active sync   /dev/.static/dev/sdj
3 3   8   163  active sync   /dev/sdb
4 4   8   644  active sync   /dev/sde
5 5   8   965  active sync   /dev/sdg

And here we see that the array has 6 active devices and a spare.  You
currently have 4 working active devices, a failed active device and the
spare.  What's happened to the other device?  You can't get the array
working with 4 out of 6 devices so you'll need to either find the other
active device (and rebuild onto the spare) or get the failed disk
working again.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpHEB3tyY49z.pgp
Description: PGP signature


Re: Partitions with == or \approx same size ?

2007-07-20 Thread Robin Hill
On Fri Jul 20, 2007 at 07:54:54PM +0200, Seb wrote:

 But the number of blocks cannot be imposed when creating a partition,
 only the number of cylinders.
 
If you hit u in fdisk then you can create partitions by sector rather
than by cylinder.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpsoIAmjQDpX.pgp
Description: PGP signature


Re: Software RAID5 Horrible Write Speed On 3ware Controller!!

2007-07-18 Thread Robin Hill
On Wed Jul 18, 2007 at 01:26:11PM +0200, Hannes Dorbath wrote:

 I think what you might be experiencing is that XFS can read su,sw values 
 from the MD device and will automatically optimize itself, while it 
 can't do that for the HW RAID device. It is absolutely essential to 
 align your file system, to prevent implicit reads, needed for parity 
 calculations.
 
 Set su to the stripe size you have configured in your controller (like 
 128K) and sw to 9 (for a 10 disk RAID 5 array).
 
Just to pick up on this one (as I'm about to reformat my array as XFS) -
does this actually work with a hardware controller?  Is there any
assurance that the XFS stripes align with the hardware RAID stripes?  Or
could you just end up offsetting everything so that every 128k chunk on
the XFS side of things fits half-and-half into two hardware raid chunks
(especially if the array has been partitioned)?  In which case would
it be better (performance-wise) to provide the su,sw values or not?

I'm planning on doing some benchmarking first but thought I'd check
whether there's any definitive answers first.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgprkL6eq3YTe.pgp
Description: PGP signature


Re: Questions about the speed when MD-RAID array is being initialized.

2007-05-10 Thread Robin Hill
On Thu May 10, 2007 at 05:33:17PM -0400, Justin Piszcz wrote:

 
 
 On Thu, 10 May 2007, Liang Yang wrote:
 
 Hi,
 
 I created a MD-RAID5 array using 8 Maxtor SAS Disk Drives (chunk size is 
 256k). I have measured the data transfer speed for single SAS disk drive 
 (physical drive, not filesystem on it), it is roughly about 80~90MB/s.
 
 However, I notice MD also reports the speed for the RAID5 array when it is 
 being initialized (cat /proc/mdstat). The speed reported by MD is not 
 constant which is roughly from 70MB/s to 90MB/s (average is 85MB/s which 
 is very close to the single disk data transfer speed).
 
 I just have three questions:
 1. What is the exact meaning of the array speed reported by MD? Is that 
 mesured for the whole array (I used 8 disks) or for just single underlying 
 disk? If it is for the whole array, then 70~90B/s seems too low 
 considering 8 disks are used for this array.
 
 2. How is this speed measured and what is the I/O packet size being used 
 when the speed is measured?
 
 3. From the beginning when MD-RAID 5 array is initialized to the end when 
 the intialization is done, the speed reports by MD gradually decrease from 
 90MB/s down to 70MB/s. Why does the speed change? Why does the speed 
 gradually decrease?
 
 Could anyone give me some explanation?
 
 I'm using RHEL 4U4 with 2.6.18 kernel. MDADM version is 1.6.
 
 Thanks a lot,
 
 Liang
 
 
 For no 3. because it starts from the fast end of the disk and works its 
 way to the slower part (slower speeds).
 
And I'd assume for no 1 it's because it's only writing to a single disk
at this point, so will obviously be limited to the transfer rate of a
single disk.  RAID5 arrays are created as a degraded array, then the
final disk is recovered - this is done so that the array is ready for
use very quickly.  So what you're seeing in /proc/mdstat is the speed in
calculating and writing the data for the final drive (and is, unless
computationally limited, going to be the write speed of the single
drive).

HTH,
Robin

-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpuxGkc8VmMd.pgp
Description: PGP signature