Purpose of Document? (was Re: draft howto on making raids for surviving a disk crash)

2008-02-06 Thread Moshe Yudkowsky
I read through the document, and I've signed up for a Wiki account so I 
can edit it.


One of the things I wanted to do was correct the title. I see that there 
are *three* different Wiki pages about how to build a system that boots 
from RAID. None of them are complete yet.


So, what is the purpose of this page? I think the purpose is a complete 
description of how to use RAID to build a system that not only boots 
from RAID but is robust against other hazards such as file system 
corruption.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 If you pay peanuts, you get monkeys.
  Edward Yourdon, _The Decline and Fall of the American Progammer_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deleting mdadm RAID arrays

2008-02-05 Thread Moshe Yudkowsky


1. Where this info on array resides?! I have deleted /etc/mdadm/mdadm.conf 
and /dev/md devices and yet it comes seemingly out of nowhere.


/boot has a copy of mdadm.conf so that / and other drives can be started 
and then mounted. update-initramfs will update /boot's copy of mdadm.conf.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deleting mdadm RAID arrays

2008-02-05 Thread Moshe Yudkowsky

Michael Tokarev wrote:

Janek Kozicki wrote:

Marcin Krol said: (by the date of Tue, 5 Feb 2008 11:42:19 +0100)


2. How can I delete that damn array so it doesn't hang my server up in a loop?

dd if=/dev/zero of=/dev/sdb1 bs=1M count=10


This works provided the superblocks are at the beginning of the
component devices.  Which is not the case by default (0.90
superblocks, at the end of components), or with 1.0 superblocks.

  mdadm --zero-superblock /dev/sdb1


Would that work if even if he doesn't update his mdadm.conf inside the 
/boot image? Or would mdadm attempt to build the array according to the 
instructions in mdadm.conf? I expect that it might depend on whether the 
instructions are given in terms of UUID or in terms of devices.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 I think it a greater honour to have my head standing on the ports
  of this town for this quarrel, than to have my portrait in the
  King's bedchamber. -- Montrose, 20 May 1650
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Moshe Yudkowsky

Michael Tokarev wrote:

Moshe Yudkowsky wrote:
[]

But that's *exactly* what I have -- well, 5GB -- and which failed. I've
modified /etc/fstab system to use data=journal (even on root, which I
thought wasn't supposed to work without a grub option!) and I can
power-cycle the system and bring it up reliably afterwards.


Note also that data=journal effectively doubles the write time.
It's a bit faster for small writes (because all writes are first
done into the journal, i.e. into the same place, so no seeking
is needed), but for larger writes, the journal will become full
and data found in it needs to be written to proper place, to free
space for new data.  Here, if you'll continue writing, you will
have more than 2x speed degradation, because of a) double writes,
and b) more seeking.


The alternative seems to be that portions of the / file system won't 
mount because the file system is corrupted on a crash while writing.


If I'm reading the man pages, Wikis, READMEs and mailing lists correctly 
--  not necessarily the case -- the ext3 file system uses the equivalent 
of data=journal as a default.


The question then becomes what data scheme to use with reiserfs on the 
remainder of the file system, the /usr, /var, /home, and others. If they 
can recover on a reboot sing fsck and the default configuration of 
resierfs, then I have no problem using them. But my understanding is 
that data can be destroyed or lost or destroyed if there's a crash on a 
write; then there's little point in running a RAID system that can 
collect corrupt data.


Another way to phrase this: unless you're running data-center grade 
hardware and have absolute confidence in your UPS, you should use 
data=journal for reiserfs and perhaps avoid XFS entirely.



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Right in the middle of a large field where there had never been a 
trench was a
shell hole... 8 feet deep by 15 across. On the edge of it was a dead... 
rat not over
twice the size of a mouse. No wonder the war costs so much. Col. George 
Patton

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Moshe Yudkowsky

Robin, thanks for the explanation. I have a further question.

Robin Hill wrote:


Once the file system is mounted then hdX,Y maps according to the
device.map file (which may actually bear no resemblance to the drive
order at boot - I've had issues with this before).  At boot time it maps
to the BIOS boot order though, and (in my experience anyway) hd0 will
always map to the drive the BIOS is booting from.


At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. 
Therefore, I don't quite understand why this would not work:


grub EOF
root(hd2,1)
setup(hd2)
EOF

This would seem to be a command to have the MBR on hd2 written to use 
the boot on hd2,1. It's valid when written. Are you saying that it's a 
command for the MBR on /dev/sdc to find the data on (hd2,1), the 
location of which might change at any time? That's... a  very strange 
way to write the tool. I thought it would be a command for the MBR on 
hd2 (sdc) to look at hd2,1 (sdc1) to find its data, regardless of the 
boot order that caused sdc to be the boot disk.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Bring me the head of Prince Charming.
-- Robert Sheckley  Roger Zelazny
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Moshe Yudkowsky

Eric,

Thanks very much for your note. I'm becoming very leery of resiserfs at 
the moment... I'm about to run another series of crash tests.


Eric Sandeen wrote:

Justin Piszcz wrote:


Why avoid XFS entirely?

esandeen, any comments here?


Heh; well, it's the meme.


Well, yeah...


Note also that ext3 has the barrier option as well, but it is not
enabled by default due to performance concerns.  Barriers also affect
xfs performance, but enabling them in the non-battery-backed-write-cache
scenario is the right thing to do for filesystem integrity.


So if I understand you correctly, you're stating that current the most 
reliable fs in its default configuration, in terms of protection against 
power-loss scenarios, is XFS?



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 There is something fundamentally wrong with a country [USSR] where
  the citizens want to buy your underwear.  -- Paul Thereaux
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

Robin Hill wrote:


File not found at that point would suggest it can't find the kernel
file.  The path here should be relative to the root of the partition
/boot is on, so if your /boot is its own partition then you should
either use kernel /vmlinuz or (the more usual solution from what
I've seen) make sure there's a symlink:
ln -s . /boot/boot


Robin,

Thanks very much! ln -s . /boot/boot works to get past this problem.

Now it's failed in a different section and complains that it can't find 
/sbin/init. I'm at the (initramfs) prompt, which I don't ever recall 
seeing before. I can't  mount /dev/md/root on any mount points (invalid 
arguments even though I'm not supplying any). I've checked /dev/md/root 
and it does work as expected when I try mounting it while in my 
emergency partition, and it does contain /sbin/init and the other files 
and mount points for /var, /boot, /tmp, etc.


So this leads me to the question of why /sbin isn't being seen. /sbin is 
on the device /dev/md/root, and /etc/fstab specifically mounts it at /. 
 I would think /boot would look at an internal copy of /etc/fstab. Is 
this another side effect of using /boot on its own partition?


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Blessed are the peacemakers,
  for they shall be mowed down in the crossfire.
-- Michael Flynn
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

maximilian attems wrote:


error 15 is an *grub* error.

grub is known for it's dislike of xfs, so with this whole setup use ext3
rerun grub-install and you should be fine.


I should mention that something *did* change. When attempting to use 
XFS, grub would give me a note about 18 partitions used (I forget the 
exact language). This was different than I'd remembered; when I switched 
back to using reiserfs, grub reports using 19 partitions.


So there's something definitely interesting about XFS and booting.

As an additional note, if I use the grub boot-time commands to edit root 
to read, e.g., root=/dev/sda2 or root=/dev/sdb2, I get the same Error 15 
error message.


It may be that grub is complaining about grub and resiserfs, but I 
suspect that it has a true complain about the file system and what's on 
the partitions.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 If, after hearing my songs, just one human being is inspired to
  say something nasty to a friend, it will all have been worthwhile.
-- Tom Lehrer
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

I've managed to get myself into a little problem.

Since power hits were taking out the /boot partition, I decided to split 
/boot out of root. Working from my emergency partition,  I copied all 
files from /root, re-partitioned what had been /root into room for /boot 
and /root, and then created the drives. This left me with /dev/md/boot, 
/dev/md/root, and /dev/md/base (everything else).


I modified mdadm.conf on the emergency partition, used update-initramfs 
to make certain that the new md drives would be recognized, and 
rebooted. This worked as expected.


I then mounted all the entire new file system on a mount point, copied 
the mdadm.conf to that point, did a chroot to that point, and did an 
update-initramfs so that the non-emergency partition would have the 
updated mdadm.conf. This worked -- but with complaints about missing the 
file /proc/modules (which is not present under chroot). If I use the -v 
option I can see the raid456, raid1, etc. modules loading.


I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
(thanks, Robin!) successfully.


Problem: on reboot, the I get an error message:

root (hd0,1)  (Moshe comment: as expected)
Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro

Error 15: File not found

Did I miss something? I'm pretty certain this is the procedure I used 
before. The XFS module is being loaded by update-initramfs, so unless 
there's a reason that I can't boot md from  a boot partition with the 
XFS file system, then I don't understand what the problem is.


Comments welcome -- I'm wedged!


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Many that live deserve death. And some that die deserve life. Can you 
give it to

them? Then do not be too eager to deal out death in judgement. For even the
wise cannot see all ends.
-- Gandalf (J.R.R. Tolkien)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

I wrote:

Now it's failed in a different section and complains that it can't find 
/sbin/init. I'm at the (initramfs) prompt, which I don't ever recall 
seeing before. I can't  mount /dev/md/root on any mount points (invalid 
arguments even though I'm not supplying any). I've checked /dev/md/root 
and it does work as expected when I try mounting it while in my 
emergency partition, and it does contain /sbin/init and the other files 
and mount points for /var, /boot, /tmp, etc.


So this leads me to the question of why /sbin isn't being seen. /sbin is 
on the device /dev/md/root, and /etc/fstab specifically mounts it at /. 
 I would think /boot would look at an internal copy of /etc/fstab. Is 
this another side effect of using /boot on its own partition?


The answer: I managed to make a mistake in the configuration of grub, in 
 /boot/grub/menu.lst. I'd changed root= from /dev/md/root to 
/dev/md/boot -- but I really need to include the *root* location, which 
does not change, vs. the boot location, which is not relevant.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
The central tenet of Buddhism is not 'Every man for himself.'
-- Wanda
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Moshe Yudkowsky
I've been reading the draft and checking it against my experience. 
Because of local power fluctuations, I've just accidentally checked my 
system:  My system does *not* survive a power hit. This has happened 
twice already today.


I've got /boot and a few other pieces in a 4-disk RAID 1 (three running, 
one spare). This partition is on /dev/sd[abcd]1.


I've used grub to install grub on all three running disks:

grub --no-floppy EOF
root (hd0,1)
setup (hd0)
root (hd1,1)
setup (hd1)
root (hd2,1)
setup (hd2)
EOF

(To those reading this thread to find out how to recover: According to 
grub's map option, /dev/sda1 maps to hd0,1.)



After the power hit, I get:

 Error 16
 Inconsistent filesystem mounted

I then tried to boot up on hda1,1, hdd2,1 -- none of them worked.

The culprit, in my opinion, is the reiserfs file system. During the 
power hit, the reiserfs file system of /boot was left in an inconsistent 
state; this meant I had up to three bad copies of /boot.


Recommendations:

1. I'm going to try adding a data=journal option to the reiserfs file 
systems, including the /boot. If this does not work, then /boot must be 
ext3 in order to survive a power hit.


2. We discussed what should be on the RAID1 bootable portion of the 
filesystem. True, it's nice to have the ability to boot from just the 
RAID1 portion. But if that RAID1 portion can't survive a power hit, 
there's little sense. It might make a lot more sense to put /boot on its 
own tiny partition.


The Fix:

The way to fix this problem with booting is to get the reiser file 
system back into sync. I did this by booting to my emergency single-disk 
partition ((hd0,0) if you must know) and then mounting the /dev/md/root 
that contains /boot. This forced a resierfs consistency check and 
journal replay, and let me reboot without problems.




--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
A gun is, in many people's minds, like a magic wand. If you point it at 
people,

they are supposed to do your bidding.
-- Edwin E. Moise, _Tonkin Gulf_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Moshe Yudkowsky

Robin Hill wrote:


This is wrong - the disk you boot from will always be hd0 (no matter
what the map file says - that's only used after the system's booted).
You need to remap the hd0 device for each disk:

grub --no-floppy EOF
root (hd0,1)
setup (hd0)
device (hd0) /dev/sdb
root (hd0,1)
setup (hd0)
device (hd0) /dev/sdc
root (hd0,1)
setup (hd0)
device (hd0) /dev/sdd
root (hd0,1)
setup (hd0)
EOF


For my enlightenment: if the file system is mounted, then hd2,1 is a 
sensible grub operation, isn't it? For the record, given my original 
script when I boot I am able to edit the grub boot options to read


root (hd2,1)

and proceed to boot.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 I love deadlines... especially the whooshing sound they
  make as they fly past.
-- Dermot Dobson
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-03 Thread Moshe Yudkowsky

Michael Tokarev wrote:


Speaking of repairs.  As I already mentioned, I always use small
(256M..1G) raid1 array for my root partition, including /boot,
/bin, /etc, /sbin, /lib and so on (/usr, /home, /var are on
their own filesystems).  And I had the following scenarios
happened already:


But that's *exactly* what I have -- well, 5GB -- and which failed. I've 
modified /etc/fstab system to use data=journal (even on root, which I 
thought wasn't supposed to work without a grub option!) and I can 
power-cycle the system and bring it up reliably afterwards.


So I'm a little suspicious of this theory that /etc and others can be on 
the same partition as /boot in a non-ext3 file system.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Thanks to radio, TV, and the press we can now develop absurd
  misconceptions about peoples and governments we once hardly knew
  existed.-- Charles Fair, _From the Jaws of Victory_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-30 Thread Moshe Yudkowsky

David Greaves wrote:


Moshe Yudkowsky wrote:

I expect it's because I used 1.2 superblocks (why
not use the latest, I said, foolishly...) and therefore the RAID10 --


Aha - an 'in the wild' example of why we should deprecate '0.9 1.0 1.1, 1.2' and
rename the superblocks to data-version + on-disk-location :)


Even if renamed, I'd still need a Clue as to why to prefer one scheme 
over the other. For example, I've now learned that if I want to set up a 
RAID1 /boot, it must actually be 1.2 or grub won't be able to read it. 
(I would therefore argue that if the new version ever becomes default, 
then the default sub-version ought to be 1.2.)


As to the wiki: I am not certain I found the Wiki you're referring to; I 
did find others, and none had the ringing clarity of Peter's definitive 
RAID10 won't work for /boot.


The process I'm going through -- cloning an old amd-k7 server into a new 
amd64 server -- is something I will document, and this particular grub 
issue is one of the things I intend to mention. So, where is this Wiki 
of which you speak?


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 A kind word will go a long way, but a kind word and
  a gun will go even further.
-- Al Capone
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-30 Thread Moshe Yudkowsky

Michael Tokarev wrote:


You only write to root (including /bin and /lib and so on) during
software (re)install and during some configuration work (writing
/etc/password and the like).  First is very infrequent, and both
needs only a few writes, -- so write speed isn't important.


Thanks, but I didn't make myself clear. The preformance problem I'm 
concerned about was having different md drives accessing different 
partitions.


For example, I can partition the drives as follows:

/dev/sd[abcd]1 -- RAID1, /boot

/dev/sd[abcd]2 -- RAID5, the rest of the file system

I originally had asked, way back when, if having different md drives on 
different partitions of the *same* disk was a problem for perfomance -- 
 or if, for some reason (e.g., threading) it was actually smarter to do 
it that way. The answer I received was from Iustin Pop, who said :


Iustin Pop wrote:

md code works better if it's only one array per physical drive,
because it keeps statistics per array (like last accessed sector,
etc.) and if you combine two arrays on the same drive these
statistics are not exactly true anymore


So if I use /boot on its own drive and it's only accessed at startup, 
the /boot will only be accessed that one time and afterwards won't cause 
problems for the drive statistics. However, if I use put /boot, /bin, 
and /sbin on this RAID1 drive, it will always be accessed and it might 
create a performance issue.


To return to that peformance question, since I have to create at least 2 
md drives using different partitions, I wonder if it's smarter to create 
multiple md drives for better performance.


/dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin

/dev/sd[abcd]2 -- RAID5, most of the rest of the file system

/dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading (writes)


For typical filesystem usage, raid5 works good for both reads
and (cached, delayed) writes.  It's workloads like databases
where raid5 performs badly.


Ah, very interesting. Is this true even for (dare I say it?) bittorrent 
downloads?



What you do care about is your data integrity.  It's not really
interesting to reinstall a system or lose your data in case if
something goes wrong, and it's best to have recovery tools as
easily available as possible.  Plus, amount of space you need.


Sure, I understand. And backing up in case someone steals your server. 
But did you have something specific in mind when you wrote this? Don't 
all these configurations (RAID5 vs. RAID10) have equal recovery tools?


Or were you referring to the file system? Reiserfs and XFS both seem to 
have decent recovery tools. LVM is a little tempting because it allows 
for snapshots, but on the other hand I wonder if I'd find it useful.




Also, placing /dev on a tmpfs helps alot to minimize number of writes
necessary for root fs.

Another interesting idea. I'm not familiar with using tmpfs (no need,
until now); but I wonder how you create the devices you need when you're
doing a rescue.


When you start udev, your /dev will be on tmpfs.


Sure, that's what mount shows me right now -- using a standard Debian 
install. What did you suggest I change?



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Many that live deserve death. And some that die deserve life. Can you 
give it to

them? Then do not be too eager to deal out death in judgement. For even the
wise cannot see all ends.
-- Gandalf (J.R.R. Tolkien)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?)

2008-01-30 Thread Moshe Yudkowsky

Peter Rabbitson wrote:
It seems like it. I just created the above raid configuration with 5 
loop devices. Everything behaved just like Michael described. When the 
wrong drives disappeared - I started getting IO errors.


My mind boggles. I know how to mount an ISO as a loop device onto the 
file system, but if you'd be so kind, can you give a super-brief 
description on how to get a loop device to look like an actual partition 
that can be made into a RAID array? I can see this software-only 
solution as being quite interesting for testing in general.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 I'm very well aquainted/with the seven deadly sins/
  I keep a busy schedule/ to try to fit them in.
-- Warren Zevon, Mr. Bad Example
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Documentation? failure to update-initramfs causes Infinite md loop on boot

2008-01-30 Thread Moshe Yudkowsky
I reformatted the disks in preparation to my move to a RAID1/RAID5 
combination. I couldn't --stop the array (that should have told me 
something), so I removed ARRAY from mdadm.conf and restarted. I ran 
fdisk to create the proper partitions, and then I removed the /dev/md* 
and /dev/md/* entries in anticipation of creating the new ones. I then 
rebooted to pick up the new partitions I'd created.


Now I can no longer boot, with this series of messages:

md: md_import_device returned: -22
md: mdadm failed to add /dev/sdb2 to /dev/md/all: invalid argument
mdadm: failed to RUN_ARRAY /dev/md/all: invalid argument
md: sdc2 has invalid sb, not importing!

Thousands of these go past, and there's no escape. That's quite a severe 
error. I'm going to boot on a rescue disk to fix this -- there's no 
other way I can think of to get out of this mess -- but I wonder if 
there ought to be documentation on the interaction between mdadm and 
update-initramfs.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Becoming the biggest banana republic in the world -- and without the 
bananas, at that -- is an unenviable prospect.

-- Sergei Stepashin, Prime Minister of Russia
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Documentation? failure to update-initramfs causes Infinite md loop on boot

2008-01-30 Thread Moshe Yudkowsky

maximilian attems wrote:


pretty simple, you change mdadm.conf put it also on initramfs:
update-initramfs -u -k all


Sure, that's what I did after boot on rescue, chroot, etc. However, I 
wonder if the *documentation* -- Wiki, or even the man page discussion 
on boot -- should mention that changes in the mdadm.conf have to be 
propogated to /boot?


In fact, that's an interesting question: which changes have to propagate 
to /boot? I'd think any change that affects md devices with /etc/fstab 
entries set to auto would have to be followed by update-initramfs. 
That's quite some bit of hidden knowledge if true.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
  All friends have real and imaginary components.
 -- Moshe Yudkowsky
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky
Neil, thanks for writing. A couple of follow-up questions to you and the 
group:


Neil Brown wrote:

On Monday January 28, [EMAIL PROTECTED] wrote:
Perhaps I'm mistaken but I though it was possible to do boot from 
/dev/md/all1.


It is my understanding that grub cannot boot from RAID.


Ah. Well, even though LILO seems to be less classy and in current 
disfavor, can I boot RAID10/RAID5 from LILO?



You can boot from raid1 by the expedient of booting from one of the
halves.


One of the puzzling things about this is that I conceive of RAID10 as 
two RAID1 pairs, with RAID0 on top of to join them into a large drive. 
However, when I use --level=10  to create my md drive, I cannot find out 
which two pairs are the RAID1's: the --detail doesn't give that 
information. Re-reading the md(4) man page, I think I'm badly mistaken 
about RAID10.


Furthermore, since grub cannot find the /boot on the md drive, I deduce 
that RAID10 isn't what the 'net descriptions say it is.



A common approach is to make a small raid1 which contains /boot and
boot from that.  Then use the rest of your devices for raid10 or raid5
or whatever.


Ah. Ny understanding from a previous question to this group was that 
using one partition of the drive for RAID1 and the other for RAID5 would 
(a) create inefficiencies in read/write cycles as the two different md 
drives maintained conflicting internal tables of the overall physical 
drive state and (b) would create problems if one or the other failed.


Under the alternative solution (booting from half of a raid1) since I'm 
booting from just one of the halves or the raid1, I would have to set up 
grub on both halves. If one physical drive fails, grub would fail over 
to the next device.


(My original question was prompted by my theory that multiple RAID5s, 
built out of different partitions, would be faster than a single large 
drive -- more threads to perform calculations during writes to different 
parts of the physical drives.)



Am I trying to do something that's basically impossible?


I believe so.


If the answers above don't lead to a resolution, I can create two RAID1 
pairs and join them using LVM. I would take a hit by using LVM to tie 
the pairs intead of RAID0, I suppose, but I would avoid the performance 
hit of multiple md drives on a single physical drive, and I could even 
run a hot spare through a sparing group. Any comments on the performance 
hit -- is raid1L a really bad idea for some reason?


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 It's a sobering thought, for example, to realize that by the time
  he was my age, Mozart had been dead for two years.
-- Tom Lehrer
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Yes, but please provide the clue (was Re: [PATCH] Use new sb type)

2008-01-29 Thread Moshe Yudkowsky


* The only raid level providing unfettered access to the underlying 
filesystem is RAID1 with a superblock at its end, and it has been common 
wisdom for years that you need RAID1 boot partition in order to boot 
anything at all.


Ah. This shines light on my problem...

The problem is that these three points do not affect any other raid 
level (as you can not boot from any of them in a reliable fashion 
anyway). I saw a number of voices that backward compatibility must be 
preserved. I don't see any need for that because:


* The distro managers will definitely RTM and will adjust their flashy 
GUIs to do the right thing by explicitly supplying -e 1.0 for boot devices


The Debian stable distro won't let you create /boot on an LVM RAID1, but 
that seems to be the extent of current RAID awareness. Using the GUI, if 
you create a large RAID5 and attempt to boot off it  -- well, you're 
toast, but you don't find out until LILO and grub portion of the 
installation fails.


* A clueless user might burn himself by making a single root on a single 
raid1 device. But wait - he can burn himself the same way by making the 
root a raid5 device and rebooting.


Okay, but:

Why do we sacrifice the right thing to do? To eliminate the 
possibility of someone shooting himself in the foot by not reading the 
manual?


Speaking for clueless users everywhere: I'd love to Read The Fine 
Manual, but the Fine md, mdadm, and mdadm.conf Manuals that I've read 
don't have information about grub/LILO issues. A hint such as grub and 
LILO can only work from RAID 1 and superblocks greater than 1.0 will 
toast your system in any case is crucial information to have. Not 
everyone will catch this particular thread -- they're going to RTFM and 
make a mistake *regardless*.


And now, please excuse me while I RTFM to find out if I change the 
superblocks to 1.0 from 1.2 on a running array...


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 If you're not part of the solution, you're part of the process.
-- Mark A. Johnson
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky

Peter Rabbitson wrote:

[*] The layout is the same but the functionality is different. If you 
have 1+0 on 4 drives, you can survive a loss of 2 drives as long as they 
are part of different mirrors. mdadm -C -l 10 -n 4 -o n2 drives 
however will _NOT_ survive a loss of 2 drives.


In my 4 drive system, I'm clearly not getting 1+0's ability to use grub 
out of the RAID10.  I expect it's because I used 1.2 superblocks (why 
not use the latest, I said, foolishly...) and therefore the RAID10 -- 
with even number of drives -- can't be read by grub. If you'd patch that 
information into the man pages that'd be very useful indeed.


Thanks for your attention to this!

--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 no user serviceable parts below this line
-- From a Perl program by [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky

Michael Tokarev wrote:


There are more-or-less standard raid LEVELS, including
raid10 (which is the same as raid1+0, or a stripe on top
of mirrors - note it does not mean 4 drives, you can
use 6 - stripe over 3 mirrors each of 2 components; or
the reverse - stripe over 2 mirrors of 3 components each
etc).


Here's a baseline question: if I create a RAID10 array using default 
settings, what do I get? I thought I was getting RAID1+0; am I really?


My superblocks, by the way, are marked version 01; my metadata in 
mdadm.conf asked for 1.2. I wonder what I really got. The real question 
in my mind now is why grub can't find the info, and either it's because 
of 1.2 superblocks or because of sub-partitioning of components.



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 You may not be interested in war, but war is interested in you.
-- Leon Trotsky
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky

Peter Rabbitson wrote:

It is exactly what the names implies - a new kind of RAID :) The setup 
you describe is not RAID10 it is RAID1+0. As far as how linux RAID10 
works - here is an excellent article: 
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10


Thanks. Let's just say that the md(4) man page was finally penetrating 
my brain, but the Wikipedia article helped a great deal. I had thought 
md's RAID10 was more standard.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Rumor is information distilled so finely that it can filter through 
anything.

 --  Terry Pratchet, _Feet of Clay_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky

Keld Jørn Simonsen wrote:


raid10 have a number of ways to do layout, namely the near, far and
offset ways, layout=n2, f2, o2 respectively.


The default layout, according to --detail, is near=2, far=1. If I 
understand what's been written so far on the topic, that's automatically 
incompatible with 1+0.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky
I'd like to thank everyone who wrote in with comments and explanations. 
And in particular it's nice to see that I'm not the only one who's confused.


I'm going to convert back to the RAID 1 setup I had before for /boot, 2 
hot and 2 spare across four drives. No, that's wrong: 4 hot makes the 
most sense.


And given that RAID 10 doesn't seem to confer (for me, as far as I can 
tell) advantages in speed or reliability -- or the ability to mount just 
one surviving disk of a mirrored pair -- over RAID 5, I think I'll 
convert back to RAID 5, put in a hot spare, and do regular backups (as 
always). Oh, and use reiserfs with data=journal.


Comments back:

Peter Rabbitson wrote:

Maybe you are, depending on your settings, but this is beyond the point. 
No matter what 1+0 you have (linux, classic, or otherwise) you can not 
boot from it, as there is no way to see the underlying filesystem 
without the RAID layer.


Sir, thank you for this unequivocal comment. This comment clears up all 
my confusion. I had a wrong mental model of how file system maps work.


With the current state of affairs (available mainstream bootloaders) the 
rule is:

Block devices containing the kernel/initrd image _must_ be either:
* a regular block device (/sda1, /hda, /fd0, etc.)
* or a linux RAID 1 with the superblock at the end of the device 
(0.9 or 1.2)


Thaks even more: 1.2 it is.


This is how you find the actual raid version:

mdadm -D /dev/md[X] | grep Version

This will return a string of the form XX.YY.ZZ. Your superblock version 
is XX.YY.


Ah hah!

Mr. Tokarev wrote:


By the way, on all our systems I use small (256Mb for small-software systems,
sometimes 512M, but 1G should be sufficient) partition for a root filesystem
(/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all...
... doing [it]
this way, you always have all the tools necessary to repair a damaged system
even in case your raid didn't start, or you forgot where your root disk is
etc etc.


An excellent idea. I was going to put just /boot on the RAID 1, but 
there's no reason why I can't add a bit more room and put them all 
there. (Because I was having so much fun on the install, I'm using 4GB 
that I was going to use for swap space to mount base install and I'm 
working from their to build the RAID. Same idea.)


Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes 
hits on the RAID1 drive which ultimately degrade overall performance? 
/lib is hit only at boot time to load the kernel, I'll guess, but /bin 
includes such common tools as bash and grep.



Also, placing /dev on a tmpfs helps alot to minimize number of writes
necessary for root fs.


Another interesting idea. I'm not familiar with using tmpfs (no need, 
until now); but I wonder how you create the devices you need when you're 
doing a rescue.


Again, my thanks to everyone who responded and clarified.

--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Practically perfect people never permit sentiment to muddle their 
thinking.

-- Mary Poppins
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky

Keld Jørn Simonsen wrote:

Based on your reports of better performance on RAID10 -- which are more 
significant that I'd expected -- I'll just go with RAID10. The only 
question now is if LVM is worth the performance hit or not.



I would be interested if you would experiment with this wrt boot time,
for example the difference between /root on a raid5, raid10,f2 and raid10,o2.


According to man md(4), the o2 is likely to offer the best combination 
of read and write performance. Why would you consider f2 instead?


I'm unlike to do any testing beyond running bonnie++ or something 
similar once it's installed.



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky



Hmm, why would you put swap on a raid10? I would in a production
environment always put it on separate swap partitions, possibly a number,
given that a number of drives are available.


I put swap onto non-RAID, separate partitions on all 4 disks.

In a production server, however, I'd use swap on RAID in order to 
prevent server downtime if a disk fails -- a suddenly bad swap can 
easily (will absolutely?) cause the server to crash (even though you can 
boot the server up again afterwards on the surviving swap partitions).


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 She will have fun who knows when to work
  and when not to work.
-- Segami
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-29 Thread Moshe Yudkowsky

Keld Jørn Simonsen wrote:

On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote:

Hmm, why would you put swap on a raid10? I would in a production
environment always put it on separate swap partitions, possibly a number,
given that a number of drives are available.
In a production server, however, I'd use swap on RAID in order to 
prevent server downtime if a disk fails -- a suddenly bad swap can 
easily (will absolutely?) cause the server to crash (even though you can 
boot the server up again afterwards on the surviving swap partitions).


I see. Which file system type would be good for this?
I normally use XFS but maybe other FS is better, given that swap is used
very randomly 8read/write).

Will a bad swap crash the system?


Well, Peter says it will, and that's good enough for me. :-)

As for which file system: I would use fdisk to partition the md disk and 
then use mkswap on the partition to make it into a swap partition. It's 
a naive approach but I suspect it's almost certainly the correct one.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 There are more ways to skin a cat than nuking it from orbit
-- but it's the only way to be sure.
-- Eliezer Yudkowsky
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to eradicate previous version of device information, even with zero-superblock and dd

2008-01-28 Thread Moshe Yudkowsky
I've been trying to bring up a RAID10 device, and I'm having some 
difficulty with automatically-created device names.


mdadm version 2.5.6, Debian Etch.

With metadata=1.2 in my config file,

mdadm --create /dev/md/all --auto=p7 -n 4 --level=10 /dev/sd*2

This does seem to create a RAID array. I see that my /dev/md/ directory 
is populated with all1 through all7.


On reboot, however, I notice that there's a suddenly a /dev/md127 
device. Confused, I attempted to start over many times, but I can't seem 
to create a non-all array and I can't seem to create a simple 
/dev/md/0 array.


Steps:

To eradicate all prior traces of md configuration, I issue these commands:

mdadm --stop /dev/md/all

which stops.

mdadm --zero-superblock  /dev/sd[each drive]2


I went further (after some trouble) and issued

dd if=/dev/zero of=/dev/sd[each drive]2 count=2M

I then issue:

rm /dev/md* /dev/md/*

The ARRAY information is commented out of the config file (mdadm.conf).

On reboot, I see that the devices /dev/md/all, /dev/md/all1, etc. have 
reappeared, along /dev/md127, /dev/md_127, and /dev/md_d127.


This is very, very puzzling.

Well, I thought I could work around this. I issued

mdadm --create /dev/md/all

with the same paramters as above. I can use cfdisk and fdisk (either 
one) to create two partitions, /dev/md/all1 and /dev/md/all2.


However,

mkfs.reiserfs /dev/md/all1

claims that the /dev/md/all1 has no such device or address.

ls -l /dev/md/all gives

brw-rw 1 root disk 254, 8129 (date) /dev/md/all1

QUESTIONS:

1. If I create a device called /dev/md/all, should I expect that mdadm 
will create a device called /dev/md/127, and that mdadm --detail --scan 
will report it as /dev/md127 or something similar?


2. How can I completely eradicate all traces of previous work, given 
that zero-superblock and dd on the drives that make up the array doesn't 
seem to erase previous information?




--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 If you're going to shoot, shoot! Don't talk!
   -- Eli Wallach,The Good, the Bad, and the Ugly
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to eradicate previous version of device information, even with zero-superblock and dd

2008-01-28 Thread Moshe Yudkowsky



QUESTIONS:

1. If I create a device called /dev/md/all, should I expect that mdadm 
will create a device called /dev/md/127, and that mdadm --detail --scan 
will report it as /dev/md127 or something similar?


That's still happening. However:

2. How can I completely eradicate all traces of previous work, given 
that zero-superblock and dd on the drives that make up the array doesn't 
seem to erase previous information?


Answer:

In order for the md drives to be started on a reboot, upgrade-initramfs 
   places information about the current configuration into boot 
configuration.


In order to eradicate everything, stop all arrays, comment out any ARRAY 
lines in mdadm.conf, remove all md device files, and then issue


update-initramfs

This cleans out the information that's hidden inside the /boot area. On 
the next reboot, no extraneous md files are present. It's then possible 
to issue an mdadm --create /dev/md/all that will create the appropriate 
md devices automatically with proper major and minor device numbers.


To get the md device started correctly at init time, I seem to require 
the use of update-initramfs. I will investigate further when I've got 
some time...



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 The odds are good, but the goods are odd.
 -- Alaskan women, on the high ratio of men to women in Alaska
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


In this partition scheme, grub does not find md information?

2008-01-28 Thread Moshe Yudkowsky
I'm finding a problem that isn't covered by the usual FAQs and online 
recipes.


Attempted setup: RAID 10 array with 4 disks.

Because Debian doesn't include RAID10 in its installation disks, I 
created a Debian installation on the first partition of sda, in 
/dev/sda1. Eventually I'll probably convert it to swap, but in the 
meantime that 4G has  a complete 2.6.18 install (Debian stable).


I created a RAID 10 array of four partitions, /dev/md/all, out of 
/dev/sd[abcd]2.


Using fdisk/cfdisk, I created the partition/dev/md/all1 (500 MB) for 
/boot, and the parition /dev/md/all2  with all remaining space into one 
large partition (about 850 GB). That larger partition contains /, /usr, 
/home, etc. each as a separate LVM volume. I copied usr, var, etc. (but 
not proc or sys, of course) files over to the raid array, mounted that 
array, did a chroot to its root, and started grub.


I admit that I'm no grub expert, but it's clear that grub cannot find 
any of the information in /dev/md/all1. For example,


grub find /boot/grub/this_is_raid

can't find a file that exists only on the raid array. Grub only searches 
/dev/sda1, not /dev/md/all1.


Perhaps I'm mistaken but I though it was possible to do boot from 
/dev/md/all1.


I've tried other attacks but without success. For example, also while in 
chroot,


grub-install /dev/md/all2 does not work. (Nor does it work with the 
--root=/boot option.)


I also tried modifications to menu.lst, adding root=/dev/md/all1 to the 
kernel command, but RAID array's version of menu.lst is never detected.


What I do see is

grub find /boot/grub/stage1
 (hd0,0)

which indicates (as far as I can tell) that it's found the information 
written on /dev/sda1 and nothing in /dev/md/all1.


Am I trying to do something that's basically impossible?

--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-22 Thread Moshe Yudkowsky

Carlos Carvalho wrote:


I use reiser3 and xfs. reiser3 is very good with many small files. A
simple test shows interactively perceptible results: removing large
files is faster with xfs, removing large directories (ex. the kernel
tree) is faster with reiser3.


My current main concern about XFS and reiser3 is writebacks. The default 
mode for ext3 is journal, which in case of power failure is more 
robust than the writeback modes of XFS, reiser3, or JFS -- or so I'm 
given to understand.


On the other hand, I have a UPS and it should shut down gracefully 
regardless if there's a power failure. I wonder if I'm being too cautious?



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Keep some secrets/Never tell,
  And they will keep you very well.
-- Michelle Shocked
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


One Large md or Many Smaller md for Better Peformance?

2008-01-20 Thread Moshe Yudkowsky
Question: with the same number of physical drives,  do I get better 
performance with one large md-based drive, or do I get better 
performance if I have several smaller md-based drives?


Situation: dual CPU, 4 drives (which I will set up as RAID-1 after being 
terrorized by the anti-RAID-5 polemics included in the Debian distro of 
mdadm).


I've two choices:

1. Allocate all the drive space into a single large partition, place 
into a single RAID array (either 10 or 1 + LVM, a separate question).


2. Allocate each drive into several smaller partitions. Make each set of 
smaller partitions into a separate RAID 1 array and use separate RAID md 
drives for the various file systems.


Example use case:

While working other problems, I download a large torrent in the 
background. The torrent writes to its own, separate file system called 
/foo. If /foo is mounted on its own RAID 10 or 1-LVM array, will that 
help or hinder overall system responsiveness?


It would seem a no brainer that giving each major filesystem its own 
array would allow for better threading and responsiveness, but I'm 
picking up hints in various piece of documentation that the performance 
can be counter-intuitive. I've even considered the possibility of giving 
/var and /usr separate RAID arrays (data vs. executables).


If an expert could chime in, I'd appreciate it a great deal.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 There are more ways to skin a cat than nuking it from orbit
-- but it's the only way to be sure.
-- Eliezer Yudkowsky
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Performance of RAID 10 vs. using LVM?

2008-01-20 Thread Moshe Yudkowsky
Let's assume that I have 4 drives; they are set up in mirrored pairs as 
RAID 1, and then aggregated together to create a RAID 10 system (RAID 1 
followed by RAID 0). That is, 4 x N disks become a 2N size filesystem.


Question: Is this higher or lower performance than using LVM to 
aggregate the disks?


LVM allows the creation of unitary file system from disparate physical 
drives, and has the advantage that filesystems can be expanded or shrunk 
with ease. I'll be using LVM on top of the RAID 1 or RAID 10 regardless.


Therefore, I can use LVM to create a 1L system, to coin an acronym. 
This would have the same 2N size, but would be created by LVM instead of 
RAID 0. Is there a performance advantage to using RAID 10 instead of 
RAID 1L? (The other question is whether the hypothetical performance 
advantage of 10 outweighs the flexibility advantage 1L, a question that 
only an individual user can answer... perhaps.)


Comments extremely welcome.

--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 The sharpest knives are also the quietest.
 -- John M. Ford, _The Final Reflection_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Performance?

2008-01-20 Thread Moshe Yudkowsky

Bill Davidsen wrote:

One partitionable RAID-10, perhaps, then partition as needed. Read the 
discussion here about performance of LVM and RAID. I personally don't do 
LVM unless I know I will have to have great flexibility of configuration 
and can give up performance to get it. Other report different results, 
so make up your own mind.


I've used Google to search (again) through the archives of the 
newsgroup, and performance lvm turns up relevant discussions back in 
2004 or so, but nothing very recent. Am I missing some other location 
for discussions of this question, or perhaps I'm looking in the wrong 
places? (The Wiki didn't help either.)


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Rumor is information distilled so finely that it can filter through 
anything.

 --  Terry Pratchet, _Feet of Clay_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-20 Thread Moshe Yudkowsky

Thanks for the tips, and in particular:

Iustin Pop wrote:


  - if you download torrents, fragmentation is a real problem, so use a
filesystem that knows how to preallocate space (XFS and maybe ext4;
for XFS use xfs_io to set a bigger extend size for where you
download)


That's a very interesting idea; it also gives me an opportunity to 
experiment with XFS. I had been avoiding it because of possible 
power-failure issues on writes.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 She will have fun who knows when to work
  and when not to work.
-- Segami
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html