Re: [Linux-HA] crm_failcount queries quite slow?

Dominik Klein Thu, 03 Apr 2008 23:26:07 -0700

Lars Marowsky-Bree wrote:

On 2008-04-03T13:59:36, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:

Any crm* program is significantly slower on a non-DC node
regardless of whether something's happening in the cluster. It's
always been like that.


I can confirm that. It's been for me ever since I started using heartbeat.

Hm, I've not personally observed that in my test cluster, or at least
not noticed anything out of line.

"Significantly" slower is bad; we mandate that "DC or not DC" is _not_
the question, and that users shouldn't care about this designation.

Could anyone who reproduces this report a few more details? Is it the
local node, the time it takes to process on the DC, or the network
roundtrip? (Should be observable using tcpdump/wireshark)


Just 2 measurements:

dktest2sles10:~# time crmadmin -D
Designated Controller is: dktest2sles10

real    0m0.005s
user    0m0.004s
sys     0m0.000s

dktest1sles10:~/cib# time crmadmin -D
Designated Controller is: dktest2sles10

real    0m1.014s
user    0m0.000s
sys     0m0.004s

dktest2sles10:~# time cibadmin -Q &> /dev/null

real    0m0.009s
user    0m0.004s
sys     0m0.004s

dktest1sles10:~/cib# time cibadmin -Q &> /dev/null

real    0m1.713s
user    0m0.004s
sys     0m0.004s

tcpdump:

y.x.z.103 is the DC
y.x.z.102 is the other node

08:22:16.803702 IP 10.200.200.102.32952 > 10.200.200.103.694: UDP,length 21708:22:16.803626 IP 10.250.250.102.32951 > 10.250.250.103.694: UDP,length 22108:22:16.803637 IP 10.250.250.102.32951 > 10.250.250.103.694: UDP,length 21708:22:16.929482 IP 10.250.250.103.32869 > 10.250.250.102.694: UDP,length 22108:22:16.929528 IP 10.200.200.103.32870 > 10.200.200.102.694: UDP,length 221

up to here, it's been just the normal heartbeat packets I think. Noticethe roughly identical length.


Then I do:

debian dktest1sles10:~/cib# date +%H:%M:%S:%N; time cibadmin -Q &> /dev/null
08:22:16:041111482

real    0m1.189s
user    0m0.008s
sys     0m0.00

08:22:16.929976 IP 10.250.250.103.32869 > 10.250.250.102.694: UDP,length 226308:22:16.930026 IP 10.200.200.103.32870 > 10.200.200.102.694: UDP,length 2263

08:22:16.930029 IP 10.200.200.103 > 10.200.200.102: udp
08:22:16.929979 IP 10.250.250.103 > 10.250.250.102: udp

Both servers received an ntpdate sync against the same timesource aminute earlier. So to me, it looks like it's the DC who needs some timeto process the request. The cluster had one primitive resource at thattime and should have been pretty much idle.


Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm_failcount queries quite slow?

Reply via email to