Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID & auto-recovery

Ulrich Windl Tue, 29 Nov 2011 23:50:17 -0800

>>> Lars Marowsky-Bree <[email protected]> schrieb am 29.11.2011 um 14:49 in 
>>> Nachricht
<[email protected]>:
> On 2011-11-29T08:33:01, Ulrich Windl <[email protected]> 
> wrote:
> 
> > The state of an unmanaged resource is the state when it left the managed 
> meta-state.
> 
> That is not correct. An unmanaged resource is not *managed*, but its
> state is still relevant to other resources that possibly depend on it.


But isn't it a contradiction that the cluster monitors an unmanaged resource 
for state chenges? "Unmanaged" means. "leave it alone, don't care about it".

> 
> The original design goal was for unmanaged resources to be
> "placeholders" whose state could be externally set. If you monitor them,
> they'll still be monitored.

Well, original designs are rarely perfect.

> 
> People use it to disable operations on a resource, but the Raid1 agent
> with a > 0 check depth is special: it is documented to actually do
> something in its monitor operation, namely try to rebuild RAID sets
> (which mdadmd alas can't do). If you don't want that, don't enable it.

In HP-UX things are much more convenient for the user: LVMS resynchronizes 
mirrors automatically:

Nov 23 14:52:10 hp1 vmunix:
Nov 23 14:52:42 hp1 vmunix: LVM: WARNING: VG 64 0x1f0000: LV 2: Some I/O 
requests to this LV are waiting
Nov 23 14:52:42 hp1 vmunix:        indefinitely for an unavailable PV. These 
requests will be queued until
Nov 23 14:52:42 hp1 vmunix:        the PV becomes available (or a timeout is 
specified for the LV).
Nov 23 14:52:42 hp1 vmunix: LVM: WARNING: VG 64 0x1f0000: LV 1: Some I/O 
requests to this LV are waiting
Nov 23 14:52:53 hp1 vmunix: LVM: VG 64 0x0a0000: PVLink 3 0x000006 Failed! The 
PV is not accessible.
Nov 23 14:52:54 hp1 vmunix: LVM: VG 64 0x0c0000: PVLink 3 0x000008 Failed! The 
PV is not accessible.
Nov 23 14:52:42 hp1 vmunix:        indefinitely for an unavailable PV. These 
requests will be queued until
Nov 23 14:52:54 hp1 cmdisklockd[2193]: Timed out waiting for cluster lock disk 
/dev/disk/disk56
Nov 23 14:52:42 hp1 vmunix:        the PV becomes available (or a timeout is 
specified for the LV).
Nov 23 14:52:57 hp1 vmunix: LVM: VG 64 0x1e0000: PVLink 3 0x00000b Failed! The 
PV is not accessible.
Nov 23 14:53:00 hp1 vmunix: LVM: VG 64 0x0b0000: PVLink 3 0x000007 Failed! The 
PV is not accessible.
Nov 23 14:53:08 hp1 vmunix: LVM: VG 64 0x0a0000: PVLink 3 0x000006 Recovered.
Nov 23 14:53:08 hp1 vmunix: LVM: VG 64 0x1e0000: PVLink 3 0x00000b Recovered.
Nov 23 14:53:08 hp1 vmunix: LVM: VG 64 0x0b0000: PVLink 3 0x000007 Recovered.
Nov 23 14:53:08 hp1 vmunix: LVM: NOTICE: VG 64 0x1f0000: LV 1: All I/O requests 
to this LV that were
Nov 23 14:53:08 hp1 vmunix:        waiting indefinitely for an unavailable PV 
have now completed.
Nov 23 14:53:08 hp1 vmunix: LVM: NOTICE: VG 64 0x1f0000: LV 2: All I/O requests 
to this LV that were
Nov 23 14:53:08 hp1 vmunix: LVM: VG 64 0x0c0000: PVLink 3 0x000008 Recovered.

I'm unsure how to model a degraded RAID recovery with pacemaker.

> 
> > It's valid to assume that an unmanaged resource does not change state, or 
> at least: If the unmanaged resource changes state, the cluster should not 
> care as long as the resource is unmanaged.
> > 
> > This assumption seems more logical that re-monitoring an unmanaged resource.
> 
> No. Your story would also be consistent, but is not what "unmanaged"
> does (which is also consistent - there's a clear difference between
> managed and monitored). That's what we have a maintenance mode for.

Can you set a single resource to maintenance mode?

> 
> 
> (But the mdadm operations the RA does also shouldn't cause data
> corruption. That strikes me as an MD bug.)

Maybe (there exists a service request for it): If I explicitly remove a deisk 
from a raid, the cluster should not (be able to) re-add that disk. In my case 
the data on the disk had changed, but the MD-RAID did not detect that. A full 
re-sync would have been fine, but wasn't done (relied on some bitmap at the 
beginning of the device)

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID & auto-recovery

Reply via email to