[ 
https://issues.apache.org/jira/browse/HBASE-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068450#comment-14068450
 ] 

Nicolas Liochon commented on HBASE-11492:
-----------------------------------------

(back from vacation, sorry for the delay in responding, 
[[email protected]])

The gap between 0.98 and master patch is because the master patch was limited 
to the rpcServer while you covered all ipc* in the 0.98 version.
I've done a grep on  "ipc." in the code.

v3 of the master patch includes the 0.98 ones, namely:
- ipc.server.callqueue.read.share (not used in hadoop)
- ipc.server.callqueue.handler.factor (not used in hadoop)
- ipc.server.callqueue.type (not used in hadoop nor in hbase 0.98)
- ipc.server.queue.max.call.delay  (not used in hadoop nor in hbase 0.98)
- ipc.server.max.callqueue.length (not used in hadoop)
- ipc.server.scan.vtime.weight  (not used in hadoop nor in hbase 0.98)

The same grep on 0.98 shows:
- ipc.ping.interval (feature removed on master, used in hadoop but not in the 
config files)
- ipc.socket.timeout (not used in hadoop, replaced by connect/read/write 
settings on master)

0.98.addendum contains this, with the same double read (new name / old name) 
strategy.
master v3 contains the whole patch

I think there is another issue with the 0.98 patch: as we don't set the value 
for tcpnodelay in our config files, we're using the value from hadoop common so 
our default for tcpnodelay is still 'false' I think. We don't have this in 
master as we don't have the double read strategy.




> Hadoop configuration overrides some ipc parameters including tcpNoDelay
> -----------------------------------------------------------------------
>
>                 Key: HBASE-11492
>                 URL: https://issues.apache.org/jira/browse/HBASE-11492
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0, 0.99.0
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>            Priority: Critical
>             Fix For: 0.99.0, 0.98.4, 2.0.0
>
>         Attachments: 11492.v1.patch, 11492.v1.withp1.patch, 
> 11492.v2-0.98.patch, 11492.v2.patch, 11492.v2.patch
>
>
> There is an option to set tcpNoDelay, defaulted to true, but the socket 
> channel is actually not changed. As a consequence, the server works with 
> nagle enabled. This leads to very degraded behavior when a single connection 
> is shared between threads. We enter into conflicts with nagle and tcp delayed 
> ack. 
> Here is an example of performance with the PE tool plus HBASE-11491:
> {noformat}
> oneCon     #client       sleep          exeTime (seconds)                     
>         avg latency, sleep excluded (microseconds)
> true           1               0                31                            
>                          310
> false          1               0                31                            
>                          310
> true           2               0                50                            
>                           500
> false          2               0               31                             
>                          310
> true           2                5               488 (including 200s sleeping) 
>               2880 
> false          2               5               246  (including 200s sleeping) 
>               460
> {noformat}
> The latency is multiple by 5 (2880 vs 460) when the connection is shared. 
> This is the delayed ack kicking in. This can be fixed by really using tcp no 
> delay.
> Any application sharing the tcp connection between threads has the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to