Nils Goroll wrote:
> Hi All,
>
> Peter Memishian has recommended that I ask the NFS community rather 
> than the networking community, see 
> http://opensolaris.org/jive/thread.jspa?threadID=97367&tstart=0 for 
> the original thread.
>
> I have written a fix for the issue initially documented in 6696163, 
> for which I
> have now opened 6817942 to get one bug for one issue.
>
Nils, I'll take a look at this and get back to you early next week.

Just for my understanding, whats the background/motivation in
your interest in increasing the max number of connections?

Mahesh
> The root cause is documented in 6696163: When choosing a connection 
> for an RPC
> call, the old connmgr_get code used lbolt timestamps which don't have 
> enough
> precision (nowadays) to yield proper load balancing.
>
> I have implemented round robin load balancing: Whenever a connection is
> requested via connmgr_get, it is noted as last used. Upon the next 
> connection
> request, the connection which comes next in the list of connections 
> (or the
> first one if the last one was used last) is returned.
>
> I have left a comment in the code explaining why I chose the 
> particular approach
> and what I think should be done when someone comes to do a larger 
> rewrite of the
> code in question.
>
> I would appreciate reviews of my suggested fix, which is available at
>
> http://cr.opensolaris.org/~nigoroll/rpc_loadbalancing_6817942/
>
> Thanks, Nils
>
> -- 
>
> Tests I conducted:
>
> #### PREPARATIONS
>
>> clnt_max_conns/D
> clnt_max_conns:
> clnt_max_conns: 4
>
>> rpclog/W b
> rpclog:         0               =       0xb
>
> ifconfig e1000g0 addif 192.168.77.100/24 up deprecated
> share -o rw=localhost:192.168.77.29,root=localhost:192.168.77.29 \
> /var/tmp/s
> mkdir /tmp/c
> cd ; umount /tmp/c ; \
> mount -o vers=3,proto=tcp 192.168.77.100:/var/tmp/s /tmp/c ; \
> cd /tmp/c
>
> #### TEST BALANCING
>
>> rpclog/W 2
> rpclog:         0               =       0x2
>> clnt_max_conns/D
> clnt_max_conns:
> clnt_max_conns: 4
>
> - mount share (see above)
> - ls -als
>
> haggis:/tmp/c# egrep '19:57:.*call going out' /var/adm/messages
> Mar 16 19:58:10 haggis rpcmod: [ID 595603 kern.notice] clnt_tli_kinit: 
> COTS selected
> Mar 16 19:58:10 haggis rpcmod: [ID 359612 kern.notice] 
> clnt_cots_kcallit, procnum 4
> Mar 16 19:58:10 haggis rpcmod: [ID 832978 kern.notice] clnt_cots_kcallit:
> wait.tv_sec: 60
> Mar 16 19:58:10 haggis rpcmod: [ID 359371 kern.notice] clnt_cots_kcallit:
> wait.tv_usec: 0
> Mar 16 19:57:22 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f4598c38
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b3c8
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b5f0
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b760
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f4598c38
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b3c8
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b5f0
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b760
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f4598c38
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b3c8
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b5f0
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f459b760
> Mar 16 19:57:26 haggis rpcmod: [ID 757553 kern.notice] connmgr_get: 
> call going
> out on ffffff01f4598c38
> [...etc...]
>
> #### Test balancing throughput (can't to a real performance test on a 
> single
> #### laptop, this is more a test against massive performance regression)
>
> haggis:/tmp/c# iostat -xznM 5 & dd if=1g_1 of=/dev/zero 
> bs=$((1024*1024))&
> [2] 100689
> [3] 100690
> haggis:/tmp/c#                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>    24.1    7.5    1.1    0.1  0.8  0.3   24.3    8.1  12  20 c5t0d0
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0  122.5   0   0
> 192.168.77.100:/var/tmp/s
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>   486.4   10.9   60.6    0.1  2.4 28.1    4.7   56.5  77 100 c5t0d0
>  1907.1    0.0   59.6    0.0  0.4  1.8    0.2    0.9  18  97
> 192.168.77.100:/var/tmp/s
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>   206.5    1.2   25.8    0.0  2.4 11.8   11.7   56.7  40  40 c5t0d0
>  4199.2    0.0  131.2    0.0  0.5  2.2    0.1    0.5  33  97
> 192.168.77.100:/var/tmp/s
> 1024+0 records in
> 1024+0 records out
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.4    0.0    0.0    0.0  0.0  0.0    0.0    5.4   0   0 c5t0d0
>   461.0    0.0   14.4    0.0  0.1  0.2    0.1    0.4   4   8
> 192.168.77.100:/var/tmp/s
>
> #### Is Dooming working OK?
>
>> rpclog/W 8
> rpclog:         0               =       0x8
>> clnt_max_conns/W 8
> clnt_max_conns: 0x4             =       0x8
>
> * ls -als on share
>
> haggis:/tmp/c# netstat -an| egrep '^192.168.77.100.2049'
> 192.168.77.100.2049  192.168.77.29.1023   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1022   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1021   49152      0 49308      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1020   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1019   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1018   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1017   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1016   49152      0 49304      0 
> ESTABLISHED
>
>> clnt_max_conns/W 4
> clnt_max_conns: 0x8             =       0x4
>
> haggis:/tmp/c# ls -als
> total 7235597
>    3 drwxr-xr-x   2 root     root          10 Mar 16 10:47 .
>    8 drwxrwxrwt   8 root     sys          720 Mar 16 19:57 ..
> 878969 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_1
> 794225 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_2
> 1130659 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_3
> 865401 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_4
> 778861 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_5
> 953995 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_6
> 969355 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_7
> 864121 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_8
> haggis:/tmp/c# Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice]
> connmgr_get: too many conns, dooming entry ffffff01e0ca9240
> Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] connmgr_get: 
> too many
> conns, dooming entry ffffff0205a14280
> Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] connmgr_get: 
> too many
> conns, dooming entry ffffff01fed6d700
> Mar 16 20:01:56 haggis rpcmod: [ID 882835 kern.notice] connmgr_get: 
> too many
> conns, dooming entry ffffff01f468d380
>
> ... waiting a while ...
>
> Mar 16 20:05:12 haggis rpcmod: [ID 772401 kern.notice] connmgr_sndrel: 
> sending
> ordrel to queue ffffff020b4a7e50
> Mar 16 20:05:12 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff020b4a7e50: got T_ORDREL_REQ
> Mar 16 20:05:12 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff020b6cde40: got T_ORDREL_REQ
> netstat -an| egrep '^192.168.77.100.2049'
> 192.168.77.100.2049  192.168.77.29.1022   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1021   49152      0 49308      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1020   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1019   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1018   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1017   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1016   49152      0 49304      0 
> ESTABLISHED
> haggis:/tmp/c# Mar 16 20:05:46 haggis rpcmod: [ID 772401 kern.notice]
> connmgr_sndrel: sending ordrel to queue ffffff02039093b0
> Mar 16 20:05:46 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff02039093b0: got T_ORDREL_REQ
> Mar 16 20:05:46 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff020b6c88f8: got T_ORDREL_REQ
> Mar 16 20:06:02 haggis rpcmod: [ID 772401 kern.notice] connmgr_sndrel: 
> sending
> ordrel to queue ffffff0202e440f8
> Mar 16 20:06:02 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff0202e440f8: got T_ORDREL_REQ
> Mar 16 20:06:02 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff020b6cd3a0: got T_ORDREL_REQ
> Mar 16 20:06:03 haggis rpcmod: [ID 772401 kern.notice] connmgr_sndrel: 
> sending
> ordrel to queue ffffff02030cdba0
> Mar 16 20:06:03 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff02030cdba0: got T_ORDREL_REQ
> Mar 16 20:06:03 haggis rpcmod: [ID 776223 kern.notice] mir_wput_other wq
> 0xffffff02038798f0: got T_ORDREL_REQ
> netstat -an| egrep '^192.168.77.100.2049'
> 192.168.77.100.2049  192.168.77.29.1019   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1018   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1017   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1016   49152      0 49304      0 
> ESTABLISHED
>
>
> #### Simple test with one connection only
>
>> clnt_max_conns/W 1
> clnt_max_conns: 0x4             =       0x1
>
> haggis:/tmp/c# iostat -xznM 5& dd if=1g_3 of=/dev/null bs=$((1024*1024))
> [2] 100717
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>    17.5    4.6    1.4    0.1  0.4  0.5   16.8   24.7   6  11 c5t0d0
>    98.8    0.0    3.1    0.0  0.0  0.0    0.1    0.5   0   3
> 192.168.77.100:/var/tmp/s
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>   533.7    0.0   66.7    0.0 24.8  5.6   46.5   10.5  91  98 c5t0d0
>  1916.6    0.0   59.7    0.0  0.4  2.3    0.2    1.2  21  96
> 192.168.77.100:/var/tmp/s
> kill %2
>                     extended device statistics
>     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>   350.4    0.0   43.8    0.0 19.8  4.5   56.5   12.7  73  76 c5t0d0
>  2958.3    0.0   90.5    0.0  0.3  2.4    0.1    0.8  21  97
> 192.168.77.100:/var/tmp/s
> 1024+0 records in
> 1024+0 records out
> haggis:/tmp/c# kill %2
>
> haggis:/tmp/c# ls -als
> total 7235597
>    3 drwxr-xr-x   2 root     root          10 Mar 16 10:47 .
>    8 drwxrwxrwt   9 root     sys         1317 Mar 16 20:38 ..
> 878969 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_1
> 794225 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_2
> 1130659 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_3
> 865401 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_4
> 778861 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_5
> 953995 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_6
> 969355 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_7
> 864121 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_8
> haggis:/tmp/c# netstat -an| egrep '^192.168.77.100.2049'
> 192.168.77.100.2049  192.168.77.29.1015   49152      0 49304      0 
> ESTABLISHED
>
> #### test with >1 destination address
>
> haggis:/tmp/d# mkdir /tmp/e
> haggis:/tmp/d# mount -o vers=3,proto=tcp 192.168.77.29:/var/tmp/s /tmp/e
> haggis:/tmp/d# netstat -an| egrep '^192.168.77.(29|100).2049'
> 192.168.77.100.2049  192.168.77.29.1015   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1014   49152      0 49308      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1013   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1012   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.29.2049   192.168.77.29.1011   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.29.2049   192.168.77.29.1010   49152      0 49304      0 
> ESTABLISHED
> haggis:/tmp/d# cd /tmp/e
> haggis:/tmp/e# ls -als
> total 7235597
>    3 drwxr-xr-x   2 root     root          10 Mar 16 10:47 .
>    8 drwxrwxrwt  11 root     sys         1433 Mar 16 20:41 ..
> 878969 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_1
> 794225 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_2
> 1130659 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_3
> 865401 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_4
> 778861 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_5
> 953995 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_6
> 969355 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_7
> 864121 -rw-------   1 root     root     1073741824 Mar 16 10:49 1g_8
> haggis:/tmp/e# netstat -an| egrep '^192.168.77.(29|100).2049'
> 192.168.77.100.2049  192.168.77.29.1015   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1014   49152      0 49308      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1013   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.100.2049  192.168.77.29.1012   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.29.2049   192.168.77.29.1011   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.29.2049   192.168.77.29.1010   49152      0 49304      0 
> ESTABLISHED
> 192.168.77.29.2049   192.168.77.29.1009   49152      0 49308      0 
> ESTABLISHED
> 192.168.77.29.2049   192.168.77.29.1008   49152      0 49308      0 
> ESTABLISHED
> _______________________________________________
> nfs-discuss mailing list
> nfs-discuss at opensolaris.org



Reply via email to