[ 
https://issues.apache.org/jira/browse/HBASE-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058951#comment-14058951
 ] 

Nicolas Liochon commented on HBASE-11492:
-----------------------------------------

The setting is from hadoop-common, so we get it server side as well. It 
overrides the setting we have in the code. But it's not an issue in 0.94/96 as 
the default is the same as in hadoop.
If we don't set nagle to false on the hbase server, we have an issue linked to 
the delayed ack. It's not linked to the dfs client.
The bad scenario occurs when the server has two responses to send back, each 
within a single packet: with nagle the first answer will be sent immediately, 
but the second one will wait for an ack from the client. With the tcp delayed, 
there will be no ack if the client has nothing to write to the server. That"s 
why the issue is very visible with the sleep: the reply is delayed by the sleep 
time (the next get triggers the ack of the N-1 message so the server sends its 
message N). 



> The servers do not honor the tcpNoDelay option
> ----------------------------------------------
>
>                 Key: HBASE-11492
>                 URL: https://issues.apache.org/jira/browse/HBASE-11492
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.92.2, 0.98.0, 0.96.0, 0.99.0, 0.94.20
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>            Priority: Critical
>             Fix For: 0.99.0, 0.98.5, 0.94.22
>
>         Attachments: 11492.v1.patch, 11492.v1.withp1.patch
>
>
> There is an option to set tcpNoDelay, defaulted to true, but the socket 
> channel is actually not changed. As a consequence, the server works with 
> nagle enabled. This leads to very degraded behavior when a single connection 
> is shared between threads. We enter into conflicts with nagle and tcp delayed 
> ack. 
> Here is an example of performance with the PE tool plus HBASE-11491:
> {noformat}
> oneCon     #client       sleep          exeTime (seconds)                     
>         avg latency, sleep excluded (microseconds)
> true           1               0                31                            
>                          310
> false          1               0                31                            
>                          310
> true           2               0                50                            
>                           500
> false          2               0               31                             
>                          310
> true           2                5               488 (including 200s sleeping) 
>               2880 
> false          2               5               246  (including 200s sleeping) 
>               460
> {noformat}
> The latency is multiple by 5 (2880 vs 460) when the connection is shared. 
> This is the delayed ack kicking in. This can be fixed by really using tcp no 
> delay.
> Any application sharing the tcp connection between threads has the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to