On Tue, 15 Apr 2008, Jan Chaloupecky wrote:

> I have a general question about using HA on Solaris 9 or on Solaris more
> generally. Did anybody successfully compiled and run HA ? I run into two
> problems that I can't solve. Searched in the archives of the mailing
> list and I've seen people with the same issues but I found no solution.

You don't say which version of heartbeat.  You ought to try to use the
latest: 2.1.3.

Background: By far the largest userbase of the software is on Linux.
That is where it is most stable.  The heartbeat/Linux-HA team have been
fully supportive of portability in general, and over the years several of
us have tried to ensure that it runs on Solaris and various flavours of
BSD.  But inevitably it is still more stable on Linux than on other Unix
flavours.


> 1st Problem: ncurses
> During the ./configure I have the warning about ncurses version that is 
> either not present or too old to be used. I run ./configure with the
> --includedir=/opt/csw/include/ncurses/ option cos otherwise it can't find the 
> headers at all but I get this message:
>
> configure: WARNING: The printw() function of your ncurses or curses library 
> is old, we will disable usage of the library. If you want to use this library 
> anyway, please update to newer version of the library, ncurses 5.4 or later 
> is recommended. You can get the library from 
> http://www.gnu.org/software/ncurses/.
> configure: Disabling curses

1. Strongly recommend using heartbeat's own higher-level "ConfigureMe"
rather than trying to use "configure" directly.  (Just in case that's what
you have been doing.)

2. "/opt/csw": that sounds as if you are using Blastwave for auxiliary
software.  I've personally had good success with that.  It can help to
have things such as "/opt/csw/bin" early in your $PATH, so they take
precedence over same-name things in Solaris itself.

3. But I think my "curses" has always been with Solaris's own curses
rather than CSW's.  (I can't remember; I'd need to check.)  Don't get too
distracted onto curses; it's not essential to basic heartbeat running.
(Your patches would be welcome, though!)


> 2nd Problem: hearbeat -k doesn't stop
>
> I installed hearbeat on Solaris, made a minimal configuration in ha.cf (see 
> attachment), created a authkey file but when I try to start/stop heartbeat, 
> it will not stop. The /opt/heartbeat/lib/heartbeat/heartbeat -k command just 
> "hangs" and all the hearbeat proccesses are still running:
>
> $ ps -ef | grep heartbeat
> nobody  1434  1429  0 12:03:28 ?        0:00 
> /opt/heartbeat/lib/heartbeat/heartbeat
> nobody  1433  1429  0 12:03:28 ?        0:00 
> /opt/heartbeat/lib/heartbeat/heartbeat
>      root  1429     1  0 12:03:28 ?        0:01 
> /opt/heartbeat/lib/heartbeat/heartbeat
>   nobody  1435  1429  0 12:03:28 ?        0:00 
> /opt/heartbeat/lib/heartbeat/heartbeat
>     root  1489     1  0 12:06:54 pts/1    0:00 
> /opt/heartbeat/lib/heartbeat/heartbeat -k
>     root  1442  1429  0 12:03:50 ?        0:00 
> /opt/heartbeat/lib/heartbeat/lrmd -r
>   nobody  1443  1429  0 12:03:50 ?        0:00 
> /opt/heartbeat/lib/heartbeat/stonithd
> hacluste  1444  1429  0 12:03:50 ?        0:00 
> /opt/heartbeat/lib/heartbeat/attrd
> hacluste  1445  1429  0 12:03:50 ?        0:00 
> /opt/heartbeat/lib/heartbeat/crmd
>     root  1446  1429  0 12:03:50 ?        0:00 
> /opt/heartbeat/lib/heartbeat/mgmtd -v
>     root  1469     1  0 12:04:16 pts/1    0:00 
> /opt/heartbeat/lib/heartbeat/heartbeat -k
>
>
> Did I miss a critical step in the configuration ?

Hmmm... I've got a vague recollection of something like this.  I seem to
recall finding and fixing a particular bug in one place, and noting that
there was a similar bug elsewhere but not having the test environment at
the time to be able to fix that second occurence.

It was something to do with a process doing 'exec("sh foo")' and whether
"foo" replaced the "sh" (Linux; thus making "foo" a child of the original
process) or was itself a child of "sh" (Solaris, thus making "foo" a
grandchild).  And "heartbeat -k" works on the basis 'child' (Linux) and
had trouble with the 'grandchild' (Solaris) model.  Something like that...

I suspect a lurking bug there. If I get time in the next few days I'll see
whether I can refresh my memory of it.  (I seem to recall that we
discussed it on the "linux-ha-dev" list.)

Meanwhile, if you do come up with any bug fixes and patches, please feel
free to submit them!  They would be most welcome.



-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:  UNIX Team Leader                         Durham University     :
:                                           South Road            :
:  http://www.dur.ac.uk/t.d.lee/            Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to