On 02/05/2013 10:31 PM, Lars Marowsky-Bree wrote:
> On 2013-02-05T11:36:30, Ulrich Windl <[email protected]> 
> wrote:
> 
> This looks like a support incident to me. Hard to diagnose without full
> logs.
> 
>> Let me add: I'm not completely sure, but a side-effect of this messages 
>> seems to be that resources (being cleaned up) that are running (e.g. Xen 
>> VMs) are considered "stopped". If the CRM tried to start the VM elsewhere, 
>> data corruption or other bad effects are likely...
>>
>> So I wonder: I thought that cleaning up a resource just resets the 
>> failed-count for the nodes where the resource couldn't start before. Does it 
>> (should it?) really clean the "running" status?
> 
> This part is normal. Cleanup removes the resources state from the
> cluster/LRM completely (this includes the failure counts), which is then
> reprobed. 
> 
> This does not cause concurrency violations. Even though it is true that
> the resource shows up as "not running" briefly in crm_mon/hawk.
> 
> Perhaps a new state "not probed" would be useful, since the
> probe_complete attribute is available in the CIB? Cc'ing Tim for his
> opinion.

Good point.  Even if it's generally only a brief window where resources
are shown as stopped after cleanup (even though they're never actually
stopped), that could be confusing.  In Hawk's case, the status display
is implemented such that resources with no LRM state are reported as
Stopped, where strictly they should probably show as Unknown (or, as you
say, "Not Probed").  I'll make a note to do something about that.

I'm not sure why crm_mon seems to show non-probed resources as Stopped
(it's been some time since I went digging through the pengine/unpack code).

Regards,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
[email protected]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to