Hi, sorry for the delay.

I tested now with tip.tar.bz2
> please reproduce with most recent heartbeat,
>     http://hg.linux-ha.org/heartbeat-STABLE_3_0/
>      (http:// ... /archive/tip.tar.bz2)
>
> There have been changes in the cleanup path for dead nodes.
>
>
> You did clone one into the other,
> or did a clean, independend install of both?
No clones. Both machines are set up with the installer.
> Pacemaker version?
> Glue version?
Pacemaker is 1.0.9.1+hg15626-1 (from Debian 6.0.3)
Glue is 1.0.6 (from Debian 6.0.3)
running on Kernel 2.6.32

> How, exactly? Is this "symmetrical" or "asymmetrical", i.e. does that 
> block only incoming/only outgoing/both? Have both nodes seen the 
> respective other as dead?

> Does that block both links at the same time, or one after the other? 

I now tested both. Symetrical and asymetrical, delay 2sec. One time the 
error occured after disabling/enabling asymetrical. But not every time 
the error occurs.
Both nodes show the other offline.
> Membership only sees _one_ node here, still.
> nodes=1; this somehow looks like stale data.
>
> What does /usr/lib/heartbeat/ccm_test_client say
> during such an experiment? both nodes?
It shows:

./ccm_testclient[27359]: 2011/11/17_12:53:50 info: NODES IN THE PRIMARY 
MEMBERSHIP
./ccm_testclient[27359]: 2011/11/17_12:53:50 info:      nodeid=0, 
uname=debian60-clnode1, born=514
./ccm_testclient[27359]: 2011/11/17_12:53:50 info: MY NODE IS A MEMBER 
OF THE MEMBERSHIP LIST
./ccm_testclient[27359]: 2011/11/17_12:53:50 info: NEW MEMBERS
./ccm_testclient[27359]: 2011/11/17_12:53:50 info:      NONE
./ccm_testclient[27359]: 2011/11/17_12:53:50 info: MEMBERS LOST
./ccm_testclient[27359]: 2011/11/17_12:53:50 info:      NONE
./ccm_testclient[27359]: 2011/11/17_12:53:50 info: -----------------------
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: mem_handle_event: Got 
an event OC_EV_MS_INVALID from ccm
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: mem_handle_event: no 
mbr_track info
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: mem_handle_event: Got 
an event OC_EV_MS_NEW_MEMBERSHIP from ccm
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: mem_handle_event: 
instance=515, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: event=NEW MEMBERSHIP:
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: instance=515
# ttl members=1, ttl_idx=0
# new members=0, new_idx=1
# out members=0, out_idx=3
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: NODES IN THE PRIMARY 
MEMBERSHIP
./ccm_testclient[27359]: 2011/11/17_12:53:51 info:      nodeid=0, 
uname=debian60-clnode1, born=515
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: MY NODE IS A MEMBER 
OF THE MEMBERSHIP LIST
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: NEW MEMBERS
./ccm_testclient[27359]: 2011/11/17_12:53:51 info:      NONE
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: MEMBERS LOST
./ccm_testclient[27359]: 2011/11/17_12:53:51 info:      NONE
./ccm_testclient[27359]: 2011/11/17_12:53:51 info: -----------------------
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: mem_handle_event: Got 
an event OC_EV_MS_INVALID from ccm
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: mem_handle_event: no 
mbr_track info
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: mem_handle_event: Got 
an event OC_EV_MS_NEW_MEMBERSHIP from ccm
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: mem_handle_event: 
instance=516, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: event=NEW MEMBERSHIP:
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: instance=516
# ttl members=1, ttl_idx=0
# new members=0, new_idx=1
# out members=0, out_idx=3
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: NODES IN THE PRIMARY 
MEMBERSHIP
./ccm_testclient[27359]: 2011/11/17_12:53:52 info:      nodeid=0, 
uname=debian60-clnode1, born=516
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: MY NODE IS A MEMBER 
OF THE MEMBERSHIP LIST
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: NEW MEMBERS
./ccm_testclient[27359]: 2011/11/17_12:53:52 info:      NONE
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: MEMBERS LOST
./ccm_testclient[27359]: 2011/11/17_12:53:52 info:      NONE
./ccm_testclient[27359]: 2011/11/17_12:53:52 info: -----------------------
./ccm_testclient[27359]: 2011/11/17_12:53:53 info: mem_handle_event: Got 
an event OC_EV_MS_INVALID from ccm
./ccm_testclient[27359]: 2011/11/17_12:53:53 info: mem_handle_event: no 
mbr_track info
./ccm_testclient[27359]: 2011/11/17_12:53:53 info: mem_handle_event: Got 
an event OC_EV_MS_NEW_MEMBERSHIP from ccm
./ccm_testclient[27359]: 2011/11/17_12:53:53 info: mem_handle_event: 
instance=517, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
./ccm_testclient[27359]: 2011/11/17_12:53:53 info: event=NEW MEMBERSHIP:
./ccm_testclient[27359]: 2011/11/17_12:53:53 info: instance=517
# ttl members=1, ttl_idx=0
# new members=0, new_idx=1
# out members=0, out_idx=3

.
.
.

> (or lib64, may be in the -dev package)
> btw, as long as it can talk to ccm,
> that does not terminate by itself,
> you need to ctrl-c it...
>
> I don't think so.
> Anything about lost packets?
>

> If you can "easily" reproduce (with most recent heartbeat),
> a tcpdump may be useful from, say, 10 heartbeats before you disable,
> to half a minute minute after you re-enable the network link.
> (or just crank up debuggin so high that you even see message dumps in the 
> logs...)
>
>
I have to sniff.

In this case when the nodes show each other offline, I disconnected and 
reconnected the interfaces again. After a few seconds they show each 
other online.
After that ccm_testclient shows:

./ccm_testclient[27417]: 2011/11/17_12:56:48 info: mem_handle_event: Got 
an event OC_EV_MS_NEW_MEMBERSHIP from ccm
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: mem_handle_event: 
instance=574, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: event=NEW MEMBERSHIP:
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: instance=574
# ttl members=2, ttl_idx=0
# new members=2, new_idx=0
# out members=0, out_idx=4
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: NODES IN THE PRIMARY 
MEMBERSHIP
./ccm_testclient[27417]: 2011/11/17_12:56:48 info:      nodeid=1, 
uname=debian60-clnode2, born=1
./ccm_testclient[27417]: 2011/11/17_12:56:48 info:      nodeid=0, 
uname=debian60-clnode1, born=574
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: MY NODE IS A MEMBER 
OF THE MEMBERSHIP LIST
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: NEW MEMBERS
./ccm_testclient[27417]: 2011/11/17_12:56:48 info:      nodeid=1, 
uname=debian60-clnode2, born=1
./ccm_testclient[27417]: 2011/11/17_12:56:48 info:      nodeid=0, 
uname=debian60-clnode1, born=574
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: MEMBERS LOST
./ccm_testclient[27417]: 2011/11/17_12:56:48 info:      NONE
./ccm_testclient[27417]: 2011/11/17_12:56:48 info: -----------------------


I also checked times. They are in sync.



_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to