Re: [Linux-HA] pgsql OCF resource agent and other questions

Serge Dubrouski Wed, 13 Feb 2008 05:15:34 -0800

On Feb 13, 2008 4:29 AM, Zoltan Boszormenyi <[EMAIL PROTECTED]> wrote:
>
> Andrew Beekhof írta:
> >
> > On Feb 12, 2008, at 8:57 PM, Zoltan Boszormenyi wrote:
> >
> >> Andrew Beekhof írta:
> >>>
> >>> On Feb 12, 2008, at 4:59 PM, Zoltan Boszormenyi wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Serge Dubrouski írta:
> >>>>> pgsql OCF RA doesn't support multistate configuration so I don't
> >>>>> think
> >>>>> that creating a clone would be a good idea.
> >>>>>
> >>>>
> >>>> Thanks for the information.
> >>>>
> >>>> Some other questions.
> >>>>
> >>>> According to http://linux-ha.org/v2/faq/resource_too_active
> >>>> the monitor action should return 0 for running, 7 ($OCF_NOT_RUNNING)
> >>>> for downed resources and anything else for failed ones.
> >>>> Either this documentation is buggy,
> >>>
> >>> no
> >>>
> >>>> or heartbeat doesn't conform to its own docs.
> >>>
> >>> also no
> >>>
> >>>>
> >>>> Here's the scenario: londiste creates a pidfile and deletes it when
> >>>> it quits correctly.
> >>>> However, if I kill it manually then the pidfile stays. What should
> >>>> my script return
> >>>> when it detects that the process with the indicated PID is no
> >>>> longer there?
> >>>> It's not a "downed" resource, it's a failed one. So I returned
> >>>> $OCF_ERR_GENERIC.
> >>>> But after some time heartbeat says that my resource became
> >>>> "unmanaged".
> >>>
> >>> i'm guessing (because you've not included anything on which to
> >>> comment properly) that the stop action failed
> >>
> >> It shouldn't have failed, stop action always returns $OCF_SUCCESS.
> >>
> >>>> In contrast to this, the pgsql OCF RA does it differently. It
> >>>> always returns 7
> >>>> when it finds that there's no postmaster process. Which is the
> >>>> right behaviour?
> >>>
> >>> it depends what you want to happen.
> >>> if you want a stop to be sent, use OCF_ERR_GENERIC.
> >>> if the resource is stateless and doesnt need any cleaning up, use
> >>> OCF_NOT_RUNNING
> >>
> >> It's quite an important detail. Shouldn't this be documented at
> >> http://linux-ha.org/OCFResourceAgent ?
> >
> > yep.  but its a wiki so anyone can do that :)
>
> I see. It's an excuse because no one did it yet. :-)
>
> Yesterday another problem popped up and I don't understand why
> didn't it happen before. I upgraded to heartbeat 2.1.3 using the
> SuSe build service packages at
> http://download.opensuse.org/repositories/server:/ha-clustering/
> but the problem seems persisting. I have two pgsql resources,
> using the stock install on my Fedora 6, i.e. pgdata=/var/lib/pgsql/data.
> Both are tied to their respective nodes, symmetric_cluster on,
> the constraints' score is -INFINITY for running them on the wrong node.
> The documentation of heartbeat said that for a symmetric cluster
> it's the way to bind a resource to a node (or to a set of nodes).
> The problem is that after the first pgsql resource is started successfully
> on the first node then the second pgsql resource is checked whether
> it's running on the first node - surprise, surprise, the system indicates
> that it does. As a consequence, it's marked as startup failed and
> heartbeat doesn't try to start it on the second node. Doing a cleanup
> on the failed second pgsql resource makes it start but now the first
> pgsql resource is marked failed. I guess because of the cleanup,
> the second pgsql is thuoght to be running on node1 and is stopped.
> The monitor action of the first resource notices that is's dead.
> Catch 22?
>
> Turning the configuration upside down (symmetric_cluster off
> and using +INFINITY rsc_location scores for binding to the correct
> node) didn't help.
>
> How can I solve this besides using a different PGDATA directory
> on the second node? The two machines is supposed to be configured
> identically regarding PostgreSQL.


Attached is a patch for pgsql that supposedly fixes this issue. Please
test it and let me know the results.

>
> And a question about 2.1.3. After the upgrade, haclient couldn't connect
> because mgmtd wasn't started. I needed to add these two lines to ha.cf:
>
> apiauth         mgmtd   uid=root
> respawn         root    /usr/lib64/heartbeat/mgmtd -v
>
> Is it really needed? It wasn't for 2.0.8 and the docs say that
> it's not necessary since 2.0.5. Documentation got outdated again,
> or something broke?
>
> --
> ----------------------------------
> Zoltán Böszörményi
> Cybertec Schönig & Schönig GmbH
> http://www.postgresql.at/
>
>
> _______________________________________________
>
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
Serge Dubrouski.

pgsql.patch
Description: Binary data

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pgsql OCF resource agent and other questions

Reply via email to