Lars Marowsky-Bree wrote:
On 2008-04-03T13:59:36, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
Any crm* program is significantly slower on a non-DC node
regardless of whether something's happening in the cluster. It's
always been like that.
I can confirm that. It's been for me ever since I started using heartbeat.
Hm, I've not personally observed that in my test cluster, or at least
not noticed anything out of line.
"Significantly" slower is bad; we mandate that "DC or not DC" is _not_
the question, and that users shouldn't care about this designation.
Could anyone who reproduces this report a few more details? Is it the
local node, the time it takes to process on the DC, or the network
roundtrip? (Should be observable using tcpdump/wireshark)
Just 2 measurements:
dktest2sles10:~# time crmadmin -D
Designated Controller is: dktest2sles10
real 0m0.005s
user 0m0.004s
sys 0m0.000s
dktest1sles10:~/cib# time crmadmin -D
Designated Controller is: dktest2sles10
real 0m1.014s
user 0m0.000s
sys 0m0.004s
dktest2sles10:~# time cibadmin -Q &> /dev/null
real 0m0.009s
user 0m0.004s
sys 0m0.004s
dktest1sles10:~/cib# time cibadmin -Q &> /dev/null
real 0m1.713s
user 0m0.004s
sys 0m0.004s
tcpdump:
y.x.z.103 is the DC
y.x.z.102 is the other node
08:22:16.803702 IP 10.200.200.102.32952 > 10.200.200.103.694: UDP,
length 217
08:22:16.803626 IP 10.250.250.102.32951 > 10.250.250.103.694: UDP,
length 221
08:22:16.803637 IP 10.250.250.102.32951 > 10.250.250.103.694: UDP,
length 217
08:22:16.929482 IP 10.250.250.103.32869 > 10.250.250.102.694: UDP,
length 221
08:22:16.929528 IP 10.200.200.103.32870 > 10.200.200.102.694: UDP,
length 221
up to here, it's been just the normal heartbeat packets I think. Notice
the roughly identical length.
Then I do:
debian dktest1sles10:~/cib# date +%H:%M:%S:%N; time cibadmin -Q &> /dev/null
08:22:16:041111482
real 0m1.189s
user 0m0.008s
sys 0m0.00
08:22:16.929976 IP 10.250.250.103.32869 > 10.250.250.102.694: UDP,
length 2263
08:22:16.930026 IP 10.200.200.103.32870 > 10.200.200.102.694: UDP,
length 2263
08:22:16.930029 IP 10.200.200.103 > 10.200.200.102: udp
08:22:16.929979 IP 10.250.250.103 > 10.250.250.102: udp
Both servers received an ntpdate sync against the same timesource a
minute earlier. So to me, it looks like it's the DC who needs some time
to process the request. The cluster had one primitive resource at that
time and should have been pretty much idle.
Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems