On 26/08/2014 10:38, Peter Humphrey wrote:
On Monday 25 August 2014 18:46:23 Kerin Millar wrote:
On 25/08/2014 17:51, Peter Humphrey wrote:
On Monday 25 August 2014 13:35:11 Kerin Millar wrote:
I now wonder if this is a race condition between the init script running
`mdadm -As` and the fact that the mdadm package installs udev rules that
allow for automatic incremental assembly?
Isn't it just that, with the kernel auto-assembly of the root partition,
and udev rules having assembled the rest, all the work's been done by the
time the mdraid init script is called? I had wondered about the time that
udev startup takes; assembling the raids would account for it.
Yes, it's a possibility and would constitute a race condition - even
though it might ultimately be a harmless one.
I thought a race involved the competitors setting off at more-or-less the same
time, not one waiting until the other had finished. No matter.
The mdraid script can assemble arrays and runs at a particular point in
the boot sequence. The udev rules can also assemble arrays and, being
event-driven, I suspect that they are likely to prevail. The point is
that both the sequence and timing of these two mechanisms is not
deterministic. There is definitely the potential for a race condition. I
just don't yet know whether it is a harmful race condition.
As touched upon in the preceding post, I'd really like to know why mdadm
sees fit to return a non-zero exit code given that the arrays are actually
assembled successfully.
I can see why a dev might think "I haven't managed to do my job" here.
It may be that mdadm returns different non-zero exit codes depending on
the exact circumstances. It does have this characteristic for certain
other operations (such as -t --detail).
After all, even if the arrays are assembled at the point that mdadm is
executed by the mdraid init script, partially or fully, it surely ought
not to matter. As long as the arrays are fully assembled by the time
mdadm exits, it should return 0 to signify success. Nothing else makes
sense, in my opinion. It's absurd that the mdraid script is drawn into
printing a blank error message where nothing has gone wrong.
I agree, that is absurd.
Further, the mdadm ebuild still prints elog messages stating that mdraid
is a requirement for the boot runlevel but, with udev rules, I don't see
how that can be true. With udev being event-driven and calling mdadm
upon the introduction of a new device, the array should be up and
running as of the very moment that all the disks are seen, no matter
whether the mdraid init script is executed or not.
We agree again. The question is what to do about it. Maybe a bug report
against mdadm?
Definitely. Again, can you find out what the exit status is under the
circumstances that mdadm produces a blank error? I am hoping it is
something other than 1. If so, solving this problem might be as simple
as having the mdraid script consider only a specific non-zero value to
indicate an intractable error.
There is also the matter of whether it makes sense to explicitly
assemble the arrays in the script where udev rules are already doing the
job. However, I think this would require further investigation before
considering making a bug of it.
--->8
Right. Here's the position:
1. I've left /etc/init.d/mdraid out of all run levels. I have nothing but
comments in mdadm.conf, but then it's not likely to be read anyway if
the
init script isn't running.
2. I have empty /etc/udev rules files as above.
3. I have kernel auto-assembly of raid enabled.
4. I don't use an init ram disk.
5. The root partition is on /dev/md5 (0.99 metadata)
6. All other partitions except /boot are under /dev/vg7 which is built on
top of /dev/md7 (1.x metadata).
7. The system boots normally.
I must confess that this boggles my mind. Under these circumstances, I
cannot fathom how - or when - the 1.x arrays are being assembled.
Something has to be executing mdadm at some point.
I think it's udev. I had a look at the rules, but I no grok. I do see
references to mdadm though.
So would I, only you said in step 2 that you have "empty" rules, which I
take to mean that you had overridden the mdadm-provided udev rules with
empty files. If all of the conditions you describe were true, you would
have eliminated all three of the aformentioned contexts in which mdadm
can be invoked. Given that mdadm is needed to assemble your 1.x arrays
(see below), I would expect such conditions to result in mount errors on
account of the missing arrays.
Do I even need sys-fs/mdadm installed? Maybe
I'll try removing it. I have a little rescue system in the same box, so
it'd be easy to put it back if necessary.
Yes, you need mdadm because 1.x metadata arrays must be assembled in
userspace.
I realised after writing that that I may well need it for maintenance. I'd do
that from my rescue system though, which does have it installed, so I think I
can ditch it from the main system.
Again, 1.x arrays must be assembled in userspace. The kernel cannot
assemble them by itself as it can with 0.9x arrays. If you uninstall
mdadm, you will be removing the very userspace tool that is employed for
assembly. Neither udev nor mdraid will be able to execute it, which
cannot end well.
It's a different matter when using an initramfs, because it will bundle
and make use of its own copy of mdadm.
--Kerin