Re: raid6 array , part id 'fd' not assembling at boot .
On 19 Mar 2007, James W. Laferriere outgrabe: What I don't see is the reasoning behind the use of initrd . It's a kernel ran to put the dev tree in order , start up devices ,... Just to start the kernel again ? That's not what initrds do. No second kernel is started, and constructing /dev is not one of the jobs of initrds in any case. (There *is* something that runs a second kernel if the first one dies --- google for `kexec crash dump' --- but it's entirely different in design and intent from initrds, and isn't an early-boot thing but a kernel- crash-reporting thing.) There are three different ways to enter userspace for the first time. - You can boot with an initramfs. This is the recommended way and may eventually deprecate all the others. initramfses consist of gzipped cpio archives, either constructed by hand or built automatically by the kernel build system during the build process; and either linked into the kernel image or pointed at by the bootloader as if they were initrds (or both: the two images are automatically merged). These are extracted into the `rootfs', which is the (nonswappable) ramfs filesystem which is the root of the mount tree. A minimal rootfs (with nothing on it) is linked into the kernel if nothing else is. The executable /init in the initramfs is run to switch to userspace if such exists, You switch from the rootfs to the real root filesystem once you mount it by erasing everything on the rootfs and `exec chroot'ing and/or `mount --move'ing it into place. (busybox contains a `switch_root' built-in command to do this.) (I prefer directly linking an initramfs into the kernel, because the kernel image is still stand-alone then and you don't have to engage in messes involving tracking which initramfs archive is used by which kernel if you run multiple kernels.) - You can boot with an initrd, which is a compressed *filesystem image* loaded from an external file (which the kernel is pointed at by the bootloader). The kernel runs /linuxrc to switch to userspace, and userspace should use the `pivot_root' command to flip over to the real root filesystem. (There is an older way of switching roots involving echoing device numbers into a file under /proc. Ignore it, it's disgusting.) In both these cases it is the initramfs / initrd's responsibility to parse things like the root= and init= kernel command-line parameters (and any new ones that you choose to define). (This is a far older method than initramfs, which explains the apparent duplication of effort. initramfs arose largely out of dissatisfaction with the limitations of initrds.) - You can boot with neither. In this case the kernel mounts / for you, either from a local block device, from auto-assembled md arrays with v0.90 superblocks, or remotely off NFS. Because it doesn't fsck the root filesystem before mounting it, this is slightly risky compared to the other options (where your initramfs/initrd image can fsck before mounting as usual). (initramfs archives are safest of all here because the filesystem is newly constructed by the kernel at boot time, so it is *impossible* for it to be damaged.) This option is the one where the RAID auto-assembly kicks in, and the only one so inflexible that such is needed. H. Peter Anvin has an ongoing project to move everything this option does into a default initramfs, and remove this crud from the kernel entirely. When that happens, there'll be little excuse for assembling RAID arrays using the auto-assembler :) In otherwords I beleive that initrd's are essentially pointless . But that's just my opinion . It's wrong, sorry. Try mounting / on a RAID array atop LVM partially scattered across the network via the network block device, for instance (I was running like this for some time after some unfortunate disk failures left me with too little storage on one critical machine to store all the stuff it needed to run). Hell, try mounting / on LVM at all. You need userspace to get LVM up and running, so you *need* an initrd or initramfs to do that. -- `In the future, company names will be a 32-character hex string.' --- Bruce Schneier on the shortage of company names - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid6 array , part id 'fd' not assembling at boot .
On Saturday March 17, [EMAIL PROTECTED] wrote: Neil Brown wrote: In-kernel auto-assembly using partition type 0xFD only works for metadata=0.90. This is deliberate. Don't use 0xFD partitions. Use mdadm to assemble your array, either via an initrd or (if it don't hold the root filesystem) via an init.d script. Could you clarify why someone thought it was a good idea to make it complex for users to move to current versions of the superblock? Having worked with users for way too many years, expecting end users to diddle init scripts without shooting themselves in the foot is optimism not justified by past results. At least past results as observed by me ;-) That's a loaded question isn't it? Of course no-one thought it was a good idea to make life complex for anyone. However I do not want to perpetuate a past design mistake of auto-assembling raid array based solely on partition type, and didn't want to burden version-1 superblocks with the information required to support that. So I didn't. But neither am I forcing anyone to use version-1 metadata. Most of the new functionality I have made available in v-1 metadata has also been added to v-0.90 metadata (not quite all, but there are very few needs that would drive someone to use v-1). If someone is keen to use the newest features, then I am happy with that, and am happy to provide support and advice. In doing so I learn about ways that mdadm can be improved to make life easier. But if you want to use the newest features, you need to understand all the implications there-of. As a contrast, Debian does force (or strongly encourages) people to use version-1 metadata by putting CREATE metadata=1 in /etc/mdadm/mdadm.conf. But then Debian also provides all the infrastructure for building an initrd that assembles md arrays for you quite smoothly. So they provide a complete package that just works (most of the time). I primarily provide functionality. It needs to work for everyone: those with legacy configurations that I would not recommend using on new systems, and those who build new systems with different requirements. I have to provide a variety of options. It is up to the system integrator to choose which bits of functionality to use. I would be good to create a document discussing the various issues and setting out the preferred config approach for new systems, and I have considered doing this, but unfortunately it hasn't happened yet. It would suggest: - If root/swap are on an md device, use an initrd to assemble those (swap needed for resume-from-hibernate) - Set homehost in mdadm.conf and use mdadm -As to auto-assemble everything that is meant to be assembled on this host. - Assemble all arrays as partitionable. - Use version-1.1 metadata (superblocks at the start cause less confusion I think) - run 'repair' every month and don't worry about the mismatch_cnt. That's all I can think of at the moment. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid6 array , part id 'fd' not assembling at boot .
Hello Neil Bill , On Sun, 18 Mar 2007, Bill Davidsen wrote: Neil Brown wrote: On Saturday March 17, [EMAIL PROTECTED] wrote: Neil Brown wrote: In-kernel auto-assembly using partition type 0xFD only works for metadata=0.90. This is deliberate. Don't use 0xFD partitions. Use mdadm to assemble your array, either via an initrd or (if it don't hold the root filesystem) via an init.d script. Could you clarify why someone thought it was a good idea to make it complex for users to move to current versions of the superblock? Having worked with users for way too many years, expecting end users to diddle init scripts without shooting themselves in the foot is optimism not justified by past results. At least past results as observed by me ;-) That's a loaded question isn't it? Of course no-one thought it was a good idea to make life complex for anyone. Note smiley! However I do not want to perpetuate a past design mistake of auto-assembling raid array based solely on partition type, and didn't want to burden version-1 superblocks with the information required to support that. So I didn't. Having something as critical as booting the system depend on an init script of any description just seems inherently less reliably than having the kernel do the job. Scripts depend on an interpreter, the interpreter depends on the library, and while failure is rare, it's not unheard of. So I respectfully disagree on assembly in the kernel being a design mistake, but /boot is not a problem with a 0.90 superblock, so it's not holding anything back. I agree with you Bill , Partially . But much less agree with Neil on this subject . Neil suggests that the kernel is the wrong place to assemble arrays . Neil , Is that a correct staement ? But you did stipulate that by saying 'By partition type' in a previous email . Bill suggests that shell scripts of any type 'can' fail . Why can't we have assembling of arrays by UUID IN Kernel ? The UUID's could be placed in the boot command line , Yes I know it's a limited resource , But a viable resource still . Or even some sort of Signature . Well that's a uuid in its way . Some sort of signature , small enought to be put on the boot command line or where the Kernel can read it . or EVEN in the Kernal . More below , for my reasoning of the above . But neither am I forcing anyone to use version-1 metadata. Most of the new functionality I have made available in v-1 metadata has also been added to v-0.90 metadata (not quite all, but there are very few needs that would drive someone to use v-1). With superblocks as with real estate, location is important. My prime reason for preferring 1.1 metadata (and I wish there was a painless way to update old arrays). If someone is keen to use the newest features, then I am happy with that, and am happy to provide support and advice. In doing so I learn about ways that mdadm can be improved to make life easier. But if you want to use the newest features, you need to understand all the implications there-of. As a contrast, Debian does force (or strongly encourages) people to use version-1 metadata by putting CREATE metadata=1 in /etc/mdadm/mdadm.conf. But then Debian also provides all the infrastructure for building an initrd that assembles md arrays for you quite smoothly. So they provide a complete package that just works (most of the time). I primarily provide functionality. It needs to work for everyone: those with legacy configurations that I would not recommend using on new systems, and those who build new systems with different requirements. I have to provide a variety of options. It is up to the system integrator to choose which bits of functionality to use. I would be good to create a document discussing the various issues and setting out the preferred config approach for new systems, and I have considered doing this, but unfortunately it hasn't happened yet. It would suggest: - If root/swap are on an md device, use an initrd to assemble those (swap needed for resume-from-hibernate) - Set homehost in mdadm.conf and use mdadm -As to auto-assemble everything that is meant to be assembled on this host. - Assemble all arrays as partitionable. - Use version-1.1 metadata (superblocks at the start cause less confusion I think) - run 'repair' every month and don't worry about the mismatch_cnt. That's all I can think of at the moment. I'm not sure I see the advantage of partitionable arrays for most things, and since it's likely that 90+% of users will do what their distribution install does for them, this sounds like a best practices document. Any plans to make 'repair' on RAID levels 5,6,10 check to attempt to identify the bad chunk before rewriting? I hope you vote on RAID-1 with 2 copies, and rewrite the odd man out. What I don't see is the reasoning behind the use of
Re: raid6 array , part id 'fd' not assembling at boot .
On Friday March 16, [EMAIL PROTECTED] wrote: Hello All , I am having a dickens of a time with preparing this system to replace my present one . I created a raid6 array over 6 147GB scsi drives . steps I followed were . fdisk /dev/sd[c-h] ( one at a time of course ) created a partition starting at cyl 2 -10 Cyls from the end of the drive . typed the partion FD w repeat until all six drives partitioned . mdadm --create /dev/md3 --chunk=64 --metadata=1.2 --verbose --bitmap=internal --level=6 --raid-devices=6 --spare-devices=0 /dev/sd[cdefgh]1 Built just fine . In-kernel auto-assembly using partition type 0xFD only works for metadata=0.90. This is deliberate. Don't use 0xFD partitions. Use mdadm to assemble your array, either via an initrd or (if it don't hold the root filesystem) via an init.d script. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html