Ok i understand: In a dual primary setup without a valid stonith configuration i have to wait until the crashed node is set to a *known* state: eg. using reboot, manual intervention.
But what if the crashed node never gets alive: Will the stonith setup set the state of the crashed node to a *known* state, so that the active node can continue to operate ? Or do I have to intervene manually ? So for my plan to have a high available service (which saves its state to a shared directory) a primary/secondary setup may be the way to go - or i is fencing/stonith always a must ? Gesendet: Dienstag, 12. Mai 2015 um 15:11 Uhr Von: Ivan <[email protected]> An: [email protected] Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot On 05/12/2015 02:09 PM, DRBD User wrote: > Hi > > @Cesar: thx for your suggestion - but i don't want to do a manually fence. from Digimer's replies to your posts: 1- the dlm "lock" will be released once the crashed node is set to a *known* state in pacemaker. Without releasing, forget about using your shared fs. 2- a *known* state requires a working stonith setup: either automatic (IPMI, switched PDU, ...), or manual, as Cesar described. Now, if you don't want to use stonith and you're brave enough to risk having a split-brain (you have good backups, the data on the shared fs is transient/not important, ...), I imagine you could have a shell script with a loop running in the background that would automatically ack a manual fence when needed. Or you could write a dummy stonith agent that would always return success. > > during testing i found out, that after pulling power plug the shared > directory it is not completely inaccessible : it is readable, only a write > will block until crashed node restarts - BUT what if crashed node never > restarts ? (my service saves it state into shared directory an should not > block) > > maybe its better to switch from active/active to active/passive - or is here > the situation (pull power plug, blocking..) the same ? > > thx > > Gesendet: Dienstag, 12. Mai 2015 um 12:33 Uhr > Von: "Cesar Peschiera" <[email protected]> > An: "DRBD User" <[email protected]>, [email protected] > Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after > node crash until reboot > > About of your problem of fence: > > Instead of use a fence by Hardware, you can use a manual fence that come with > the cluster software. > > Please read this: > 1- It not require any hardware. > 2- This option isn't advisable in production environments, but useful in > development environments. > 3- The file used is "fence_ack_manual" > 4- It is executed by CLI in a node that is alive for apply the fence to other > server. > 5- For use it, It is advisable that first disconnect totally the electric > power on the server that will be fenced, the goal is to shut down brutally > the server that will be fenced before of run the fence command. > 6- Finally, execute this command in a node that is alive: > Shell# /[PATH]/fence_ack_manual [IP or Name of the Node that will be fenced] > 7- Follow the steps as directed by this command. > > I hope this information is helpful. > > Best regards > Cesar > > ----- Original Message ----- > From: DRBD User[[email protected]] > To: [email protected][[email protected]] > Sent: Tuesday, May 12, 2015 5:39 AM > Subject: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node > crash until reboot > > > the DRBD status is (regardless of 'nice' shutdown (eg reboot) or 'abrupt' > kill (eg pull power plug)) > > cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated > > but only with a 'nice' shutdown the shared directoy is still accessible... > > > Gesendet: Dienstag, 12. Mai 2015 um 09:44 Uhr > Von: Digimer <[email protected][[email protected]]> > An: "DRBD User" <[email protected]>, [email protected] > Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after > node crash until reboot > On 12/05/15 03:42 AM, DRBD User wrote: >>>>> pacemakers pcs property stonith-enabled is currently set to false >> >>> Well there's your problem. :) >> >> Since i don't have any (hardware) STONITH device, i have set stonith-enabled >> to false. >> DRBD's fencing rule is set to : 'fencing: resource-only' >> >> My goal is: if one node crashes, the other node should take over the work >> immediately. But actually i have to wait the reboot time of the crashed >> node. I thought, that in such a situation the active node (rather the shared >> directory) is immediately usable ? >> >> May be i should use another fence script ? >> >> I tried to create the resource with operation 'on-fail=restart' - but no >> success ... >> >> Any other suggestions ? > > You *CAN NOT* safely proceed when a node stops responding _until_ you > have put the lost node into a known state. To do otherwise would be to > risk a split-brain. > > A good fence device are switched PDUs, like the APC-brand AP7900 (not > all makes/models are supported, so check first before buying other > brands). The AP7900 can usually be found used for ~$200 and makes an > excellent external fence device. > > Trying to use DRBD without proper fencing will result in pain and > heartache. The delay needed to fence a lost node is FAR preferable to > risking a split-brain. > > -- > Digimer > Papers and Projects: > https://alteeve.ca/w/[https://alteeve.ca/w/[https://alteeve.ca/w/]] > What if the cure for cancer is trapped in the mind of a person without > access to education? > _______________________________________________ > drbd-user mailing list > [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user][http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]] > > ------------------------------------------------------------ > > _______________________________________________ > drbd-user mailing list > [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user_______________________________________________[http://lists.linbit.com/mailman/listinfo/drbd-user_______________________________________________] > drbd-user mailing list [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user][http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]] > _______________________________________________ > drbd-user mailing list > [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user] > _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user] _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
