[DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot

DRBD User Tue, 12 May 2015 23:37:45 -0700

Ok i understand:

In a dual primary setup without a valid stonith configuration i have to wait 
until the crashed node is set to a *known* state: eg. using reboot, manual 
intervention.


But what if the crashed node never gets alive:
Will the stonith setup set the state of the crashed node to a *known* state, so 
that the active node can continue to operate ?
Or do I have to intervene manually ?

So for my plan to have a high available service (which saves its state to a 
shared directory) a primary/secondary setup may be the way to go - or i is 
fencing/stonith always a must ?
 
 

Gesendet: Dienstag, 12. Mai 2015 um 15:11 Uhr
Von: Ivan <[email protected]>
An: [email protected]
Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node 
crash until reboot
On 05/12/2015 02:09 PM, DRBD User wrote:
> Hi
>
> @Cesar: thx for your suggestion - but i don't want to do a manually fence.

from Digimer's replies to your posts:

1- the dlm "lock" will be released once the crashed node is set to a
*known* state in pacemaker. Without releasing, forget about using your
shared fs.
2- a *known* state requires a working stonith setup: either automatic
(IPMI, switched PDU, ...), or manual, as Cesar described.

Now, if you don't want to use stonith and you're brave enough to risk
having a split-brain (you have good backups, the data on the shared fs
is transient/not important, ...), I imagine you could have a shell
script with a loop running in the background that would automatically
ack a manual fence when needed. Or you could write a dummy stonith agent
that would always return success.



>
> during testing i found out, that after pulling power plug the shared 
> directory it is not completely inaccessible : it is readable, only a write 
> will block until crashed node restarts - BUT what if crashed node never 
> restarts ? (my service saves it state into shared directory an should not 
> block)
>
> maybe its better to switch from active/active to active/passive - or is here 
> the situation (pull power plug, blocking..) the same ?
>
> thx
>
> Gesendet: Dienstag, 12. Mai 2015 um 12:33 Uhr
> Von: "Cesar Peschiera" <[email protected]>
> An: "DRBD User" <[email protected]>, [email protected]
> Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after 
> node crash until reboot
>
> About of your problem of fence:
>
> Instead of use a fence by Hardware, you can use a manual fence that come with 
> the cluster software.
>
> Please read this:
> 1- It not require any hardware.
> 2- This option isn't advisable in production environments, but useful in 
> development environments.
> 3- The file used is "fence_ack_manual"
> 4- It is executed by CLI in a node that is alive for apply the fence to other 
> server.
> 5- For use it, It is advisable that first disconnect totally the electric 
> power on the server that will be fenced, the goal is to shut down brutally 
> the server that will be fenced before of run the fence command.
> 6- Finally, execute this command in a node that is alive:
> Shell# /[PATH]/fence_ack_manual [IP or Name of the Node that will be fenced]
> 7- Follow the steps as directed by this command.
>
> I hope this information is helpful.
>
> Best regards
> Cesar
>
> ----- Original Message -----
> From: DRBD User[[email protected]]
> To: [email protected][[email protected]]
> Sent: Tuesday, May 12, 2015 5:39 AM
> Subject: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node 
> crash until reboot
>
>
> the DRBD status is (regardless of 'nice' shutdown (eg reboot) or 'abrupt' 
> kill (eg pull power plug))
>
> cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated
>
> but only with a 'nice' shutdown the shared directoy is still accessible...
>
>
> Gesendet: Dienstag, 12. Mai 2015 um 09:44 Uhr
> Von: Digimer <[email protected][[email protected]]>
> An: "DRBD User" <[email protected]>, [email protected]
> Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after 
> node crash until reboot
> On 12/05/15 03:42 AM, DRBD User wrote:
>>>>> pacemakers pcs property stonith-enabled is currently set to false
>>
>>> Well there's your problem. :)
>>
>> Since i don't have any (hardware) STONITH device, i have set stonith-enabled 
>> to false.
>> DRBD's fencing rule is set to : 'fencing: resource-only'
>>
>> My goal is: if one node crashes, the other node should take over the work 
>> immediately. But actually i have to wait the reboot time of the crashed 
>> node. I thought, that in such a situation the active node (rather the shared 
>> directory) is immediately usable ?
>>
>> May be i should use another fence script ?
>>
>> I tried to create the resource with operation 'on-fail=restart' - but no 
>> success ...
>>
>> Any other suggestions ?
>
> You *CAN NOT* safely proceed when a node stops responding _until_ you
> have put the lost node into a known state. To do otherwise would be to
> risk a split-brain.
>
> A good fence device are switched PDUs, like the APC-brand AP7900 (not
> all makes/models are supported, so check first before buying other
> brands). The AP7900 can usually be found used for ~$200 and makes an
> excellent external fence device.
>
> Trying to use DRBD without proper fencing will result in pain and
> heartache. The delay needed to fence a lost node is FAR preferable to
> risking a split-brain.
>
> --
> Digimer
> Papers and Projects: 
> https://alteeve.ca/w/[https://alteeve.ca/w/[https://alteeve.ca/w/]]
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> drbd-user mailing list
> [email protected]
> http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user][http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]]
>
> ------------------------------------------------------------
>
> _______________________________________________
> drbd-user mailing list
> [email protected]
> http://lists.linbit.com/mailman/listinfo/drbd-user_______________________________________________[http://lists.linbit.com/mailman/listinfo/drbd-user_______________________________________________]
>  drbd-user mailing list [email protected] 
> http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user][http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]]
> _______________________________________________
> drbd-user mailing list
> [email protected]
> http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]
>
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot

Reply via email to