On Tue, Oct 30, 2012 at 7:11 PM, James Guthrie <[email protected]> wrote: > Hi Andrew, > > In which category should I file the bug? Based on my issues I'm assuming > "Pacemaker" > "Other" or maybe "Linux-HA" > "CRM Misc."?
http://bugs.clusterlabs.org/enter_bug.cgi?product=Pacemaker and then "Core" > > I seem to be unable to use crm_report as my install is a "Non-standard > Pacemaker installation", Really? How did you install? Usually can can find things anyway. > the documentation doesn't suggest that there's > the possibility to give a path at which the required files can be found. > Does it make sense to manually put the files together? Sure. In your case, I mostly need the logs and the corosync config. > > Regards, > James > > On 10/30/2012 05:55 AM, Andrew Beekhof wrote: >> Can you file a bug for this and include a crm_report tarball? >> It sounds like there is a mismatch in the way node name is being >> detected/calculated - which could either be a bug or a >> misconfiguration. >> >> On Tue, Oct 30, 2012 at 12:46 AM, James Guthrie <[email protected]> wrote: >>> Hi all, >>> >>> As mentioned in my previous e-mail, I get different results with >>> different nodes as DC. I have now compiled a logfile when using r3 as >>> DC, which is the case that always works. I looked into the difference >>> between this situation and the previous logfiles. In both instances the >>> same action is triggered but something different happens in both cases. >>> >>> corosync-r3-DC.log: http://pastebin.com/axSRfzEJ >>> corosync-r4-DC.log: http://pastebin.com/SETtqnZM >>> >>> On line 567 of r3-DC.log and 572 of r4-DC.log the same thing happens: >>> >>> crmd: info: abort_transition_graph: do_te_invoke:156 - >>> Triggered transition abort (complete=1) : Peer Cancelled >>> >>> With r4 as DC the following takes place (lines 600-620 of r4-DC.log - >>> date and other unnecessary information removed): >>> >>> te_update_diff:126 - Triggered transition abort (complete=1, tag=diff, >>> id=(null), magic=NA, cib=0.385.1) : Non-status change >>> Cause <diff crm_feature_set="3.0.6" > >>> Cause <diff-removed admin_epoch="0" epoch="384" num_updates="7" > >>> Cause <cib admin_epoch="0" epoch="384" num_updates="7" > >>> Cause <configuration > >>> Cause <nodes > >>> Cause <node uname="r3" id="1" /> >>> Cause </nodes> >>> Cause </configuration> >>> Cause </cib> >>> Cause </diff-removed> >>> Cause <diff-added > >>> Cause <cib epoch="385" num_updates="1" admin_epoch="0" >>> validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="r4" >>> update-client="crmd" cib-last-written="Mon Oct 29 13:41:16 2012" >>> have-quorum="1" dc-uuid="2" > >>> Cause <configuration > >>> Cause <nodes > >>> Cause <node id="1" uname="r3-eth1" /> >>> Cause </nodes> >>> Cause </configuration> >>> Cause </cib> >>> Cause </diff-added> >>> Cause </diff> >>> >>> which appears to remove the node from the CIB. >>> >>> In the case of r3 as DC, the above doesn't happen, the node remains >>> online and is then shortly assigned resources. >>> >>> Could anyone suggest a reason for the different behaviour in these cases? >>> >>> Regards, >>> James >>> >>> >>> On 10/29/2012 01:51 PM, James Guthrie wrote: >>>> Hi Michael, >>>> >>>> I have managed to successfully configure corosync with udpu, it >>>> unfortunately hasn't made a difference in the behaviour of the cluster. >>>> >>>> I have found that I don't even need to restart the host in order to get >>>> this behaviour - all I need to do is stop and restart corosync and >>>> pacemaker on *one* of the hosts. To be precise: I've been able to narrow >>>> it down to only one of the two hosts (r3). If I reboot the host, or >>>> restart the services on r4 everything works fine. If I try the same with >>>> r3, I have problems. >>>> >>>> I feel as though the answer may lie in the logfiles, the >>>> intercommunication between the individual components of the HA software >>>> makes it a bit difficult to accurately read the logfiles as an outsider >>>> to this software. I have attached the logs of both r3 and r4 after >>>> reproducing this effect this afternoon, they are much shorter to read >>>> than those previously: >>>> >>>> corosync-r3.log: http://pastebin.com/ZAhh5nax >>>> corosync-r4.log: http://pastebin.com/SETtqnZM >>>> >>>> Are there any other steps I could take in debugging this behaviour? >>>> >>>> Regards, >>>> James >>>> >>>> On 10/26/2012 04:33 PM, Michael Schwartzkopff wrote: >>>>>> Hi Michael, >>>>>> >>>>>> I'm working with a Linux From Scratch based kernel (version 3.4.7) >>>>>> running in a virtual machine and with virtual switches. >>>>> (...) >>>>>> `tcpdump -ni eth1 port 5404` returns: >>>>>> >>>>>> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes >>>>>> 16:22:27.849551 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87 >>>>>> 16:22:28.210578 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87 >>>>>> 16:22:28.770181 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87 >>>>>> 16:22:28.989802 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87 >>>>>> 16:22:29.370684 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87 >>>>>> 16:22:29.751062 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87 >>>>>> >>>>>> Every now and then there is a packet from r4 (192.168.200.170), it does >>>>>> appear as though r4 is quite quiet though. >>>>> >>>>> Ah. No pakcets from 192.168.200.166 unicast? Please try to configure >>>>> unicast in >>>>> your corosync configuration. See the udpu README file of corosync. >>>>> >>>>> I had the same problem and the cause was the the virtual bridge or KVM >>>>> dropped >>>>> all multicast packets. >>>>> >>>>> Greetings, >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
