On 16:16, Wed 24 Nov 10, Ante Karamatić wrote: > U Sri, 24. 11. 2010., u 10:52 +0000, Dave Williams je napisao/la: > > > I currently have a production clustered server down because of this and > > the fact that ubuntu (I'm advised) have an inconsistently compiled set > > of HA components. Certaintly both lucid and maverick released packages > > leave defunct processes lying around and give highly unreliable > > operation :-( > > Can you elaborate on this? > This is typical of the result of installing the pacemaker corosync cluster-glue stack on otherwise reasonably clean machines.
root 20586 0.0 0.2 227056 5704 ? Ssl Nov17 4:39 /usr/sbin/corosync root 20593 0.0 0.0 0 0 ? Z Nov17 0:00 \_ [stonithd] <defunct> 108 20594 0.0 0.2 80624 4636 ? S Nov17 0:05 \_ /usr/lib/heartbeat/cib root 20595 0.0 0.0 0 0 ? Z Nov17 0:00 \_ [lrmd] <defunct> 108 20596 0.0 0.1 81568 2780 ? S Nov17 0:06 \_ /usr/lib/heartbeat/attrd 108 20597 0.0 0.0 0 0 ? Z Nov17 0:00 \_ [pengine] <defunct> 108 20598 0.0 0.1 87796 3060 ? S Nov17 0:05 \_ /usr/lib/heartbeat/crmd 108 20601 0.0 0.2 81016 5696 ? S Nov17 0:30 \_ /usr/lib/heartbeat/cib root 20602 0.0 0.0 36424 1340 ? S Nov17 0:07 \_ /usr/lib/heartbeat/lrmd 108 20603 0.0 0.1 81568 3296 ? S Nov17 0:00 \_ /usr/lib/heartbeat/attrd 108 20604 0.0 0.1 81916 2796 ? S Nov17 0:00 \_ /usr/lib/heartbeat/pengine root 20608 0.0 0.0 0 0 ? Z Nov17 0:00 \_ [corosync] <defunct> root 20609 0.0 0.0 0 0 ? Z Nov17 0:00 \_ [corosync] <defunct> root 20613 0.0 0.0 0 0 ? Z Nov17 0:00 \_ [corosync] <defunct> It is the same irrespective of lucid/maverick cluster-glue-with-upstart/without-upstart and 32/64bits. These are all on ubuntu-server not desktop. > OTOH, upstart plugin in ubuntu packages include one patch that wasn't > accepted upstream, cause of which upstart plugin works. I appreciate the ubuntu cluster-glue package with upstart is new but sadly it wasnt obvious there were problems with it on the various "announcements" I found. I guess I shouldnt be so optimistic and current HA stack is quite a change from original heartbeat based solution I had and so a lot to learn. You know what is like when there is pressure to get things going (in our case a serious hardware failure which required complete server replacement) - you end up understanding the absolute minimum required to reach your (customer/bosses) goals. > > It's a known problem that upstream's version of cluster-glue doesn't > work yet with upstart and, pointing at my self, we still didn't test the > solution Dejan proposed. I'll do that today or tomorrow. Maybe we can work in parallel on this. As I said I'm happy to assist where I can. Whilst I am a seasoned software professional I am new to glib - so have a steep learning curve to climb in that respect! > Dejan, sorry for not respoding sooner. I'm having hard time finding some > free time to work on this :( Ditto :-( > > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
