Re: [gentoo-user] Software RAID-1

Kerin Millar Tue, 26 Aug 2014 06:22:01 -0700

On 26/08/2014 10:38, Peter Humphrey wrote:

On Monday 25 August 2014 18:46:23 Kerin Millar wrote:

On 25/08/2014 17:51, Peter Humphrey wrote:

On Monday 25 August 2014 13:35:11 Kerin Millar wrote:

I now wonder if this is a race condition between the init script running
`mdadm -As` and the fact that the mdadm package installs udev rules that
allow for automatic incremental assembly?


Isn't it just that, with the kernel auto-assembly of the root partition,
and udev rules having assembled the rest, all the work's been done by the
time the mdraid init script is called? I had wondered about the time that
udev startup takes; assembling the raids would account for it.


Yes, it's a possibility and would constitute a race condition - even
though it might ultimately be a harmless one.


I thought a race involved the competitors setting off at more-or-less the same
time, not one waiting until the other had finished. No matter.

The mdraid script can assemble arrays and runs at a particular point inthe boot sequence. The udev rules can also assemble arrays and, beingevent-driven, I suspect that they are likely to prevail. The point isthat both the sequence and timing of these two mechanisms is notdeterministic. There is definitely the potential for a race condition. Ijust don't yet know whether it is a harmful race condition.

As touched upon in the preceding post, I'd really like to know why mdadm
sees fit to return a non-zero exit code given that the arrays are actually
assembled successfully.


I can see why a dev might think "I haven't managed to do my job" here.

It may be that mdadm returns different non-zero exit codes depending onthe exact circumstances. It does have this characteristic for certainother operations (such as -t --detail).

After all, even if the arrays are assembled at the point that mdadm is
executed by the mdraid init script, partially or fully, it surely ought
not to matter. As long as the arrays are fully assembled by the time
mdadm exits, it should return 0 to signify success. Nothing else makes
sense, in my opinion. It's absurd that the mdraid script is drawn into
printing a blank error message where nothing has gone wrong.


I agree, that is absurd.

Further, the mdadm ebuild still prints elog messages stating that mdraid
is a requirement for the boot runlevel but, with udev rules, I don't see
how that can be true. With udev being event-driven and calling mdadm
upon the introduction of a new device, the array should be up and
running as of the very moment that all the disks are seen, no matter
whether the mdraid init script is executed or not.


We agree again. The question is what to do about it. Maybe a bug report
against mdadm?

Definitely. Again, can you find out what the exit status is under thecircumstances that mdadm produces a blank error? I am hoping it issomething other than 1. If so, solving this problem might be as simpleas having the mdraid script consider only a specific non-zero value toindicate an intractable error.

There is also the matter of whether it makes sense to explicitlyassemble the arrays in the script where udev rules are already doing thejob. However, I think this would require further investigation beforeconsidering making a bug of it.


--->8

Right. Here's the position:
1.  I've left /etc/init.d/mdraid out of all run levels. I have nothing but
        comments in mdadm.conf, but then it's not likely to be read anyway if 
the
        init script isn't running.
2.  I have empty /etc/udev rules files as above.
3.  I have kernel auto-assembly of raid enabled.
4.  I don't use an init ram disk.
5.  The root partition is on /dev/md5 (0.99 metadata)
6.  All other partitions except /boot are under /dev/vg7 which is built on
        top of /dev/md7 (1.x metadata).
7.  The system boots normally.


I must confess that this boggles my mind. Under these circumstances, I
cannot fathom how - or when - the 1.x arrays are being assembled.
Something has to be executing mdadm at some point.


I think it's udev. I had a look at the rules, but I no grok. I do see
references to mdadm though.

So would I, only you said in step 2 that you have "empty" rules, which Itake to mean that you had overridden the mdadm-provided udev rules withempty files. If all of the conditions you describe were true, you wouldhave eliminated all three of the aformentioned contexts in which mdadmcan be invoked. Given that mdadm is needed to assemble your 1.x arrays(see below), I would expect such conditions to result in mount errors onaccount of the missing arrays.

Do I even need sys-fs/mdadm installed? Maybe
I'll try removing it. I have a little rescue system in the same box, so
it'd be easy to put it back if necessary.


Yes, you need mdadm because 1.x metadata arrays must be assembled in
userspace.


I realised after writing that that I may well need it for maintenance. I'd do
that from my rescue system though, which does have it installed, so I think I
can ditch it from the main system.

Again, 1.x arrays must be assembled in userspace. The kernel cannotassemble them by itself as it can with 0.9x arrays. If you uninstallmdadm, you will be removing the very userspace tool that is employed forassembly. Neither udev nor mdraid will be able to execute it, whichcannot end well.

It's a different matter when using an initramfs, because it will bundleand make use of its own copy of mdadm.


--Kerin

Re: [gentoo-user] Software RAID-1

Reply via email to