Do not worry I do not need an HA support with rules/cibs etc. Regardless of what version of HA I use, it is the DRBD and OCFS2 is where my issues lie at this point.
That is why I called my post 'did not work out so well' instead of a good technical description of my problem. My main theme of the post is that the setup is complex and that I am having a lot of multi-layer issues. I got started on this track because as I was reading about DRBD they have a PDF that says what DRBD.8 supports. Active/Active OCFS2 GFS is checked. I just wanted to share my experiences with the entire process. While I think fundamentally all the pieces are sound, troubleshooting, configuring, and managing DRBD, OCFS2, Heatbeat has been quite an adventure. We all know that setup is only half the battle. The real question is does all this setup and management constitute and an effective strategy. With my experimenting I would have to say no. I think the loose couple of DRDB, OCFS2, and HA is not very effective. The system takes far longer to setup. We normally build all our systems with kickstart. The DRBD, HA, and OCFS parts are a manual process that have to be done after install. Then we have to integrate all these parts with Heartbeat. It took quite a fair amount of play before I found the option that most people found (clones of the init.d scripts) After all the setup the system happens to be very unstable. Systems that will not reboot, high IO wait. DRBD and OCFS2 kernel modules refusing to load and unload again more of an OCFS/DRBD then an HA problem. Uptime is important, but time spend on management and setup is just as important if not more. Fact is with a decent backup and a cold spare system and a kickstart file I could probably bring this system back in a half hour. Yet I have spent days/weeks worth or work/troubleshooting DRDB/OCFS2 and Heartbeat. I have had to failover and back a number of times. Also calling the data center for reboots. On 8/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > On 8/8/07, Eddie C <[EMAIL PROTECTED]> wrote: > > A shared firewire disk and a shared storage array are both options, but then > > you only have a single point of failure. Theoretically a good disk array is > > a very resilient single point of failure still is an SPOF. > > > > The great thing about DRBD active/active and OCFS2 is that you eliminate the > > SPOF. > > > > Also other options scsi locking/firewire are all related to specific > > hardware/sans. This solution would be very generic. > > > > Point taken most of my problems are DRBD, OCFS2 specific problems. > > > > I understand the points about the resource being un-managed not being a good > > thing. I know a fair amount about heartbeat colocations,orders in places. > > Here is a more technical description of what i was trying to do heartbeat > > wise. > > > > Resource 1 VIP IP (used IPADDR2) > > Resource 2 IP route (created an RA) for this > > Resource 3 Web Scraping utility (used init script) > > Resource 4 Process to work with web scraping and usenet data > > Resource 5 Usenet Scraping utility > > Resource 6 OCFS2 (cloned) > > Resource 7 DRBD (cloned) > > > > This was my first design > > Order1 - Start 7 before 6 > > Group1 - Resource1 and Resource2 Process 3,4,5 > > > > This worked well. but since everything was grouped a failed resource in > > Group1 caused everything to fail and possibly restart/move. Anyone connected > > lost connected as the VIP left and came back a few seconds later. This > > scenario was deemed unacceptable. > > > > So then i tried writing a bunch of co location rules. > > Collocate 45 > > Collocate 34 > > Collocate group1 and 4 > > That had the same effect though as grouping. an item failed it would cause > > the collocation to fail, which would take down all the other collocation. > > > > What I really needed was away to say. I need this resource to run wherever > > VIP is running. VIP should only be running on a node with the shared disk > > running. > > PLACE seems only to be able to tell a resource to run on a node. > > > > So I tried that implementation > > > > Resource 1 VIP IP --PLACE node1 100 > > Resource 2 IP route --PLACE node1 100 > > Resource 3 Web Scraping utility --PLACE node1 100 > > Resource 4 Process to work with web scraping and usenet data --PLACE node1 > > 100 > > Resource 5 Usenet Scraping utility --PLACE node1 100 > > > > This worked well because now everything is loosely coupled, and could still > > failover, but failing over the VIP and route does not fail over resource 345 > > > > So neither place nor collocation can really express I need this resource to > > run only where other resource is, but if this resource can not start don't > > fail the parent. But if the parent does fail I need the resource to evaluate > > that and move with it. A one way dependency. > > finally some clue as to what version you're running! > > please update, we've been able to do one-way colocation since 2.0.8 > > people really do make life hard on themselves when they don't provide > the relevant information to the people they want help from > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
