Nils Goroll wrote: > Hi All, > > Peter Memishian has recommended that I ask the NFS community rather > than the networking community, see > http://opensolaris.org/jive/thread.jspa?threadID=97367&tstart=0 for > the original thread. > > I have written a fix for the issue initially documented in 6696163, > for which I > have now opened 6817942 to get one bug for one issue. > Nils, I'll take a look at this and get back to you early next week.
Just for my understanding, whats the background/motivation in your interest in increasing the max number of connections? Mahesh > The root cause is documented in 6696163: When choosing a connection > for an RPC > call, the old connmgr_get code used lbolt timestamps which don't have > enough > precision (nowadays) to yield proper load balancing. > > I have implemented round robin load balancing: Whenever a connection is > requested via connmgr_get, it is noted as last used. Upon the next > connection > request, the connection which comes next in the list of connections > (or the > first one if the last one was used last) is returned. > > I have left a comment in the code explaining why I chose the > particular approach > and what I think should be done when someone comes to do a larger > rewrite of the > code in question. > > I would appreciate reviews of my suggested fix, which is available at > > http://cr.opensolaris.org/~nigoroll/rpc_loadbalancing_6817942/ > > Thanks, Nils > > -- > > Tests I conducted: > > #### PREPARATIONS > >> clnt_max_conns/D > clnt_max_conns: > clnt_max_conns: 4 > >> rpclog/W b > rpclog: 0 = 0xb > > ifconfig e1000g0 addif 192.168.77.100/24 up deprecated > share -o rw=localhost:192.168.77.29,root=localhost:192.168.77.29 \ > /var/tmp/s > mkdir /tmp/c > cd ; umount /tmp/c ; \ > mount -o vers=3,proto=tcp 192.168.77.100:/var/tmp/s /tmp/c ; \ > cd /tmp/c > > #### TEST BALANCING > >> rpclog/W 2 > rpclog: 0 = 0x2 >> clnt_max_conns/D > clnt_max_conns: > clnt_max_conns: 4 > > - mount share (see above) > - ls -als > > haggis:/tmp/c# egrep '19:57:.*call going out' /var/adm/messages > Mar 16 19:58:10 haggis rpcmod: [ID 595603 kern.notice] clnt_tli_kinit: > COTS selected > Mar 16 19:58:10 haggis rpcmod: [ID 359612 kern.notice] > clnt_cots_kcallit, procnum 4 > Mar 16 19:58:10 haggis rpcmod: [ID 832978 kern.notice] clnt_cots_kcallit: > wait.tv_sec: 60 > Mar 16 19:58:10 haggis rpcmod: [ID 359371 kern.notice] clnt_cots_kcallit: > wait.tv_usec: 0 > Mar 16 19:57:22 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f4598c38 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b3c8 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b5f0 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b760 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f4598c38 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b3c8 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b5f0 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b760 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f4598c38 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b3c8 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b5f0 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f459b760 > Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: > call going > out on ffffff01f4598c38 > [...etc...] > > #### Test balancing throughput (can't to a real performance test on a > single > #### laptop, this is more a test against massive performance regression) > > haggis:/tmp/c# iostat -xznM 5 & dd if=1g_1 of=/dev/zero > bs=$((1024*1024))& > [2] 100689 > [3] 100690 > haggis:/tmp/c# extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 24.1 7.5 1.1 0.1 0.8 0.3 24.3 8.1 12 20 c5t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 122.5 0 0 > 192.168.77.100:/var/tmp/s > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 486.4 10.9 60.6 0.1 2.4 28.1 4.7 56.5 77 100 c5t0d0 > 1907.1 0.0 59.6 0.0 0.4 1.8 0.2 0.9 18 97 > 192.168.77.100:/var/tmp/s > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 206.5 1.2 25.8 0.0 2.4 11.8 11.7 56.7 40 40 c5t0d0 > 4199.2 0.0 131.2 0.0 0.5 2.2 0.1 0.5 33 97 > 192.168.77.100:/var/tmp/s > 1024+0 records in > 1024+0 records out > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 0.4 0.0 0.0 0.0 0.0 0.0 0.0 5.4 0 0 c5t0d0 > 461.0 0.0 14.4 0.0 0.1 0.2 0.1 0.4 4 8 > 192.168.77.100:/var/tmp/s > > #### Is Dooming working OK? > >> rpclog/W 8 > rpclog: 0 = 0x8 >> clnt_max_conns/W 8 > clnt_max_conns: 0x4 = 0x8 > > * ls -als on share > > haggis:/tmp/c# netstat -an| egrep '^192.168.77.100.2049' > 192.168.77.100.2049 192.168.77.29.1023 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1022 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1021 49152 0 49308 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1020 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1019 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1018 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1017 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1016 49152 0 49304 0 > ESTABLISHED > >> clnt_max_conns/W 4 > clnt_max_conns: 0x8 = 0x4 > > haggis:/tmp/c# ls -als > total 7235597 > 3 drwxr-xr-x 2 root root 10 Mar 16 10:47 . > 8 drwxrwxrwt 8 root sys 720 Mar 16 19:57 .. > 878969 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_1 > 794225 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_2 > 1130659 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_3 > 865401 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_4 > 778861 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_5 > 953995 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_6 > 969355 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_7 > 864121 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_8 > haggis:/tmp/c# Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] > connmgr_get: too many conns, dooming entry ffffff01e0ca9240 > Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] connmgr_get: > too many > conns, dooming entry ffffff0205a14280 > Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] connmgr_get: > too many > conns, dooming entry ffffff01fed6d700 > Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] connmgr_get: > too many > conns, dooming entry ffffff01f468d380 > > ... waiting a while ... > > Mar 16 20:05:12 haggis rpcmod: [ID 772401 kern.notice] connmgr_sndrel: > sending > ordrel to queue ffffff020b4a7e50 > Mar 16 20:05:12 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff020b4a7e50: got T_ORDREL_REQ > Mar 16 20:05:12 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff020b6cde40: got T_ORDREL_REQ > netstat -an| egrep '^192.168.77.100.2049' > 192.168.77.100.2049 192.168.77.29.1022 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1021 49152 0 49308 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1020 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1019 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1018 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1017 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1016 49152 0 49304 0 > ESTABLISHED > haggis:/tmp/c# Mar 16 20:05:46 haggis rpcmod: [ID 772401 kern.notice] > connmgr_sndrel: sending ordrel to queue ffffff02039093b0 > Mar 16 20:05:46 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff02039093b0: got T_ORDREL_REQ > Mar 16 20:05:46 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff020b6c88f8: got T_ORDREL_REQ > Mar 16 20:06:02 haggis rpcmod: [ID 772401 kern.notice] connmgr_sndrel: > sending > ordrel to queue ffffff0202e440f8 > Mar 16 20:06:02 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff0202e440f8: got T_ORDREL_REQ > Mar 16 20:06:02 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff020b6cd3a0: got T_ORDREL_REQ > Mar 16 20:06:03 haggis rpcmod: [ID 772401 kern.notice] connmgr_sndrel: > sending > ordrel to queue ffffff02030cdba0 > Mar 16 20:06:03 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff02030cdba0: got T_ORDREL_REQ > Mar 16 20:06:03 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq > 0xffffff02038798f0: got T_ORDREL_REQ > netstat -an| egrep '^192.168.77.100.2049' > 192.168.77.100.2049 192.168.77.29.1019 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1018 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1017 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1016 49152 0 49304 0 > ESTABLISHED > > > #### Simple test with one connection only > >> clnt_max_conns/W 1 > clnt_max_conns: 0x4 = 0x1 > > haggis:/tmp/c# iostat -xznM 5& dd if=1g_3 of=/dev/null bs=$((1024*1024)) > [2] 100717 > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 17.5 4.6 1.4 0.1 0.4 0.5 16.8 24.7 6 11 c5t0d0 > 98.8 0.0 3.1 0.0 0.0 0.0 0.1 0.5 0 3 > 192.168.77.100:/var/tmp/s > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 533.7 0.0 66.7 0.0 24.8 5.6 46.5 10.5 91 98 c5t0d0 > 1916.6 0.0 59.7 0.0 0.4 2.3 0.2 1.2 21 96 > 192.168.77.100:/var/tmp/s > kill %2 > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 350.4 0.0 43.8 0.0 19.8 4.5 56.5 12.7 73 76 c5t0d0 > 2958.3 0.0 90.5 0.0 0.3 2.4 0.1 0.8 21 97 > 192.168.77.100:/var/tmp/s > 1024+0 records in > 1024+0 records out > haggis:/tmp/c# kill %2 > > haggis:/tmp/c# ls -als > total 7235597 > 3 drwxr-xr-x 2 root root 10 Mar 16 10:47 . > 8 drwxrwxrwt 9 root sys 1317 Mar 16 20:38 .. > 878969 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_1 > 794225 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_2 > 1130659 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_3 > 865401 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_4 > 778861 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_5 > 953995 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_6 > 969355 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_7 > 864121 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_8 > haggis:/tmp/c# netstat -an| egrep '^192.168.77.100.2049' > 192.168.77.100.2049 192.168.77.29.1015 49152 0 49304 0 > ESTABLISHED > > #### test with >1 destination address > > haggis:/tmp/d# mkdir /tmp/e > haggis:/tmp/d# mount -o vers=3,proto=tcp 192.168.77.29:/var/tmp/s /tmp/e > haggis:/tmp/d# netstat -an| egrep '^192.168.77.(29|100).2049' > 192.168.77.100.2049 192.168.77.29.1015 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1014 49152 0 49308 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1013 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1012 49152 0 49304 0 > ESTABLISHED > 192.168.77.29.2049 192.168.77.29.1011 49152 0 49304 0 > ESTABLISHED > 192.168.77.29.2049 192.168.77.29.1010 49152 0 49304 0 > ESTABLISHED > haggis:/tmp/d# cd /tmp/e > haggis:/tmp/e# ls -als > total 7235597 > 3 drwxr-xr-x 2 root root 10 Mar 16 10:47 . > 8 drwxrwxrwt 11 root sys 1433 Mar 16 20:41 .. > 878969 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_1 > 794225 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_2 > 1130659 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_3 > 865401 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_4 > 778861 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_5 > 953995 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_6 > 969355 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_7 > 864121 -rw------- 1 root root 1073741824 Mar 16 10:49 1g_8 > haggis:/tmp/e# netstat -an| egrep '^192.168.77.(29|100).2049' > 192.168.77.100.2049 192.168.77.29.1015 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1014 49152 0 49308 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1013 49152 0 49304 0 > ESTABLISHED > 192.168.77.100.2049 192.168.77.29.1012 49152 0 49304 0 > ESTABLISHED > 192.168.77.29.2049 192.168.77.29.1011 49152 0 49304 0 > ESTABLISHED > 192.168.77.29.2049 192.168.77.29.1010 49152 0 49304 0 > ESTABLISHED > 192.168.77.29.2049 192.168.77.29.1009 49152 0 49308 0 > ESTABLISHED > 192.168.77.29.2049 192.168.77.29.1008 49152 0 49308 0 > ESTABLISHED > _______________________________________________ > nfs-discuss mailing list > nfs-discuss at opensolaris.org