Re: raid6 array , part id 'fd' not assembling at boot .

2007-03-31 Thread Nix
On 19 Mar 2007, James W. Laferriere outgrabe:
   What I don't see is the reasoning behind the use of initrd .  It's a
   kernel ran to put the dev tree in order ,  start up devices ,... Just to
   start the kernel again ?

That's not what initrds do. No second kernel is started, and
constructing /dev is not one of the jobs of initrds in any case. (There
*is* something that runs a second kernel if the first one dies ---
google for `kexec crash dump' --- but it's entirely different in design
and intent from initrds, and isn't an early-boot thing but a kernel-
crash-reporting thing.)

There are three different ways to enter userspace for the first time.

 - You can boot with an initramfs. This is the recommended way and may
   eventually deprecate all the others. initramfses consist of gzipped
   cpio archives, either constructed by hand or built automatically
   by the kernel build system during the build process; and either
   linked into the kernel image or pointed at by the bootloader as if
   they were initrds (or both: the two images are automatically merged).
   These are extracted into the `rootfs', which is the (nonswappable)
   ramfs filesystem which is the root of the mount tree. A minimal rootfs
   (with nothing on it) is linked into the kernel if nothing else is.

   The executable /init in the initramfs is run to switch to userspace
   if such exists, You switch from the rootfs to the real root
   filesystem once you mount it by erasing everything on the rootfs and
   `exec chroot'ing and/or `mount --move'ing it into place. (busybox
   contains a `switch_root' built-in command to do this.)

   (I prefer directly linking an initramfs into the kernel, because the
   kernel image is still stand-alone then and you don't have to engage
   in messes involving tracking which initramfs archive is used by which
   kernel if you run multiple kernels.)

 - You can boot with an initrd, which is a compressed *filesystem image*
   loaded from an external file (which the kernel is pointed at by the
   bootloader). The kernel runs /linuxrc to switch to userspace, and
   userspace should use the `pivot_root' command to flip over to the real
   root filesystem. (There is an older way of switching roots involving
   echoing device numbers into a file under /proc. Ignore it, it's
   disgusting.)

   In both these cases it is the initramfs / initrd's responsibility to
   parse things like the root= and init= kernel command-line parameters
   (and any new ones that you choose to define).

   (This is a far older method than initramfs, which explains the
   apparent duplication of effort. initramfs arose largely out of
   dissatisfaction with the limitations of initrds.)

 - You can boot with neither. In this case the kernel mounts / for you,
   either from a local block device, from auto-assembled md arrays with
   v0.90 superblocks, or remotely off NFS. Because it doesn't fsck the
   root filesystem before mounting it, this is slightly risky compared
   to the other options (where your initramfs/initrd image can fsck
   before mounting as usual). (initramfs archives are safest of all here
   because the filesystem is newly constructed by the kernel at boot
   time, so it is *impossible* for it to be damaged.)

   This option is the one where the RAID auto-assembly kicks in, and the
   only one so inflexible that such is needed. H. Peter Anvin has an
   ongoing project to move everything this option does into a default
   initramfs, and remove this crud from the kernel entirely.

   When that happens, there'll be little excuse for assembling RAID
   arrays using the auto-assembler :)

   In otherwords I beleive that initrd's are essentially pointless .  But
   that's just my opinion .

It's wrong, sorry. Try mounting / on a RAID array atop LVM partially
scattered across the network via the network block device, for instance
(I was running like this for some time after some unfortunate disk
failures left me with too little storage on one critical machine to
store all the stuff it needed to run).

Hell, try mounting / on LVM at all. You need userspace to get LVM up
and running, so you *need* an initrd or initramfs to do that.

-- 
`In the future, company names will be a 32-character hex string.'
  --- Bruce Schneier on the shortage of company names
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid6 array , part id 'fd' not assembling at boot .

2007-03-18 Thread Neil Brown
On Saturday March 17, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
 
  In-kernel auto-assembly using partition type 0xFD only works for
  metadata=0.90.  This is deliberate.
 
  Don't use 0xFD partitions.  Use mdadm to assemble your array, either
  via an initrd or (if it don't hold the root filesystem) via an init.d
  script.
 Could you clarify why someone thought it was a good idea to make it 
 complex for users to move to current versions of the superblock? Having 
 worked with users for way too many years, expecting end users to diddle 
 init scripts without shooting themselves in the foot is optimism not 
 justified by past results. At least past results as observed by me ;-)

That's a loaded question isn't it?  Of course no-one thought it was a
good idea to make life complex for anyone.

However I do not want to perpetuate a past design mistake of
auto-assembling raid array based solely on partition type, and didn't
want to burden version-1 superblocks with the information required to
support that.  So I didn't. 

But neither am I forcing anyone to use version-1 metadata.  Most of
the new functionality I have made available in v-1 metadata has also
been added to v-0.90 metadata (not quite all, but there are very few
needs that would drive someone to use v-1).

If someone is keen to use the newest features, then I am happy with
that, and am happy to provide support and advice.  In doing so I learn
about ways that mdadm can be improved to make life easier.  But if you
want to use the newest features, you need to understand all the
implications there-of.

As a contrast, Debian does force (or strongly encourages) people to use
version-1 metadata by putting CREATE metadata=1 in
/etc/mdadm/mdadm.conf.
But then Debian also provides all the infrastructure for building an
initrd that assembles md arrays for you quite smoothly.   So they
provide a complete package that just works (most of the time).

I primarily provide functionality.  It needs to work for everyone:
those with legacy configurations that I would not recommend using on
new systems, and those who build new systems with different
requirements.  I have to provide a variety of options. It is up to the
system integrator to choose which bits of functionality to use.

I would be good to create a document discussing the various issues and
setting out the preferred config approach for new systems, and I have
considered doing this, but unfortunately it hasn't happened yet.

It would suggest:
 - If root/swap are on an md device, use an initrd to assemble those
(swap needed for resume-from-hibernate)
 - Set homehost in mdadm.conf and use mdadm -As to auto-assemble
   everything that is meant to be assembled on this host.
 - Assemble all arrays as partitionable.
 - Use version-1.1 metadata (superblocks at the start cause less
   confusion I think)
 - run 'repair' every month and don't worry about the mismatch_cnt.

That's all I can think of at the moment.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid6 array , part id 'fd' not assembling at boot .

2007-03-18 Thread Mr. James W. Laferriere

Hello Neil  Bill ,

On Sun, 18 Mar 2007, Bill Davidsen wrote:

Neil Brown wrote:

On Saturday March 17, [EMAIL PROTECTED] wrote:

Neil Brown wrote:

In-kernel auto-assembly using partition type 0xFD only works for
metadata=0.90.  This is deliberate.

Don't use 0xFD partitions.  Use mdadm to assemble your array, either
via an initrd or (if it don't hold the root filesystem) via an init.d
script.

Could you clarify why someone thought it was a good idea to make it 
complex for users to move to current versions of the superblock? Having 
worked with users for way too many years, expecting end users to diddle 
init scripts without shooting themselves in the foot is optimism not 
justified by past results. At least past results as observed by me ;-)




That's a loaded question isn't it?  Of course no-one thought it was a
good idea to make life complex for anyone.


Note smiley!

However I do not want to perpetuate a past design mistake of
auto-assembling raid array based solely on partition type, and didn't
want to burden version-1 superblocks with the information required to
support that.  So I didn't. 
Having something as critical as booting the system depend on an init script 
of any description just seems inherently less reliably than having the kernel 
do the job. Scripts depend on an interpreter, the interpreter depends on the 
library, and while failure is rare, it's not unheard of. So I respectfully 
disagree on assembly in the kernel being a design mistake, but /boot is not a 
problem with a 0.90 superblock, so it's not holding anything back.


I agree with you Bill ,  Partially .
But much less agree with Neil on this subject .
Neil suggests that the kernel is the wrong place to assemble arrays .
Neil ,  Is that a correct staement ?  But you did stipulate that by
saying 'By partition type' in a previous email .

Bill suggests that shell scripts of any type 'can' fail .

Why can't we have assembling of arrays by UUID IN Kernel ?

The UUID's could be placed in the boot command line ,  Yes I know it's
a limited resource ,  But a viable resource still .
Or even some sort of Signature .  Well that's a uuid in its way .
Some sort of signature ,  small enought to be put on the boot command
line or where the Kernel can read it . or EVEN in the Kernal .

More below ,  for my reasoning of the above .


But neither am I forcing anyone to use version-1 metadata.  Most of
the new functionality I have made available in v-1 metadata has also
been added to v-0.90 metadata (not quite all, but there are very few
needs that would drive someone to use v-1).

With superblocks as with real estate, location is important. My prime reason 
for preferring 1.1 metadata (and I wish there was a painless way to update 
old arrays).

If someone is keen to use the newest features, then I am happy with
that, and am happy to provide support and advice.  In doing so I learn
about ways that mdadm can be improved to make life easier.  But if you
want to use the newest features, you need to understand all the
implications there-of.

As a contrast, Debian does force (or strongly encourages) people to use
version-1 metadata by putting CREATE metadata=1 in
/etc/mdadm/mdadm.conf.
But then Debian also provides all the infrastructure for building an
initrd that assembles md arrays for you quite smoothly.   So they
provide a complete package that just works (most of the time).

I primarily provide functionality.  It needs to work for everyone:
those with legacy configurations that I would not recommend using on
new systems, and those who build new systems with different
requirements.  I have to provide a variety of options. It is up to the
system integrator to choose which bits of functionality to use.

I would be good to create a document discussing the various issues and
setting out the preferred config approach for new systems, and I have
considered doing this, but unfortunately it hasn't happened yet.

It would suggest:
 - If root/swap are on an md device, use an initrd to assemble those
(swap needed for resume-from-hibernate)
 - Set homehost in mdadm.conf and use mdadm -As to auto-assemble
   everything that is meant to be assembled on this host.
 - Assemble all arrays as partitionable.
 - Use version-1.1 metadata (superblocks at the start cause less
   confusion I think)
 - run 'repair' every month and don't worry about the mismatch_cnt.

That's all I can think of at the moment.

I'm not sure I see the advantage of partitionable arrays for most things, and 
since it's likely that 90+% of users will do what their distribution install 
does for them, this sounds like a best practices document. Any plans to 
make 'repair' on RAID levels 5,6,10 check to attempt to identify the bad 
chunk before rewriting? I hope you vote on RAID-1 with 2 copies, and rewrite 
the odd man out.


What I don't see is the reasoning behind the use of 

Re: raid6 array , part id 'fd' not assembling at boot .

2007-03-16 Thread Neil Brown
On Friday March 16, [EMAIL PROTECTED] wrote:
   Hello All ,  I am having a dickens of a time with preparing this system 
 to replace my present one .
   I created a raid6 array over 6 147GB scsi drives .
   steps I followed were .
 
   fdisk /dev/sd[c-h] ( one at a time of course )
 created a partition starting at cyl 2  -10 Cyls from the end of the 
 drive .
 typed the partion FD
 w
   repeat until all six drives partitioned .
 
   mdadm --create /dev/md3 --chunk=64 --metadata=1.2 --verbose 
 --bitmap=internal --level=6 --raid-devices=6 --spare-devices=0 
 /dev/sd[cdefgh]1
   Built just fine .

In-kernel auto-assembly using partition type 0xFD only works for
metadata=0.90.  This is deliberate.

Don't use 0xFD partitions.  Use mdadm to assemble your array, either
via an initrd or (if it don't hold the root filesystem) via an init.d
script.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html