On Feb 13, 2008 4:29 AM, Zoltan Boszormenyi <[EMAIL PROTECTED]> wrote: > > Andrew Beekhof írta: > > > > On Feb 12, 2008, at 8:57 PM, Zoltan Boszormenyi wrote: > > > >> Andrew Beekhof írta: > >>> > >>> On Feb 12, 2008, at 4:59 PM, Zoltan Boszormenyi wrote: > >>> > >>>> Hi, > >>>> > >>>> Serge Dubrouski írta: > >>>>> pgsql OCF RA doesn't support multistate configuration so I don't > >>>>> think > >>>>> that creating a clone would be a good idea. > >>>>> > >>>> > >>>> Thanks for the information. > >>>> > >>>> Some other questions. > >>>> > >>>> According to http://linux-ha.org/v2/faq/resource_too_active > >>>> the monitor action should return 0 for running, 7 ($OCF_NOT_RUNNING) > >>>> for downed resources and anything else for failed ones. > >>>> Either this documentation is buggy, > >>> > >>> no > >>> > >>>> or heartbeat doesn't conform to its own docs. > >>> > >>> also no > >>> > >>>> > >>>> Here's the scenario: londiste creates a pidfile and deletes it when > >>>> it quits correctly. > >>>> However, if I kill it manually then the pidfile stays. What should > >>>> my script return > >>>> when it detects that the process with the indicated PID is no > >>>> longer there? > >>>> It's not a "downed" resource, it's a failed one. So I returned > >>>> $OCF_ERR_GENERIC. > >>>> But after some time heartbeat says that my resource became > >>>> "unmanaged". > >>> > >>> i'm guessing (because you've not included anything on which to > >>> comment properly) that the stop action failed > >> > >> It shouldn't have failed, stop action always returns $OCF_SUCCESS. > >> > >>>> In contrast to this, the pgsql OCF RA does it differently. It > >>>> always returns 7 > >>>> when it finds that there's no postmaster process. Which is the > >>>> right behaviour? > >>> > >>> it depends what you want to happen. > >>> if you want a stop to be sent, use OCF_ERR_GENERIC. > >>> if the resource is stateless and doesnt need any cleaning up, use > >>> OCF_NOT_RUNNING > >> > >> It's quite an important detail. Shouldn't this be documented at > >> http://linux-ha.org/OCFResourceAgent ? > > > > yep. but its a wiki so anyone can do that :) > > I see. It's an excuse because no one did it yet. :-) > > Yesterday another problem popped up and I don't understand why > didn't it happen before. I upgraded to heartbeat 2.1.3 using the > SuSe build service packages at > http://download.opensuse.org/repositories/server:/ha-clustering/ > but the problem seems persisting. I have two pgsql resources, > using the stock install on my Fedora 6, i.e. pgdata=/var/lib/pgsql/data. > Both are tied to their respective nodes, symmetric_cluster on, > the constraints' score is -INFINITY for running them on the wrong node. > The documentation of heartbeat said that for a symmetric cluster > it's the way to bind a resource to a node (or to a set of nodes). > The problem is that after the first pgsql resource is started successfully > on the first node then the second pgsql resource is checked whether > it's running on the first node - surprise, surprise, the system indicates > that it does. As a consequence, it's marked as startup failed and > heartbeat doesn't try to start it on the second node. Doing a cleanup > on the failed second pgsql resource makes it start but now the first > pgsql resource is marked failed. I guess because of the cleanup, > the second pgsql is thuoght to be running on node1 and is stopped. > The monitor action of the first resource notices that is's dead. > Catch 22? > > Turning the configuration upside down (symmetric_cluster off > and using +INFINITY rsc_location scores for binding to the correct > node) didn't help. > > How can I solve this besides using a different PGDATA directory > on the second node? The two machines is supposed to be configured > identically regarding PostgreSQL.
Attached is a patch for pgsql that supposedly fixes this issue. Please test it and let me know the results. > > And a question about 2.1.3. After the upgrade, haclient couldn't connect > because mgmtd wasn't started. I needed to add these two lines to ha.cf: > > apiauth mgmtd uid=root > respawn root /usr/lib64/heartbeat/mgmtd -v > > Is it really needed? It wasn't for 2.0.8 and the docs say that > it's not necessary since 2.0.5. Documentation got outdated again, > or something broke? > > -- > ---------------------------------- > Zoltán Böszörményi > Cybertec Schönig & Schönig GmbH > http://www.postgresql.at/ > > > _______________________________________________ > > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Serge Dubrouski.
pgsql.patch
Description: Binary data
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
