[
https://issues.apache.org/jira/browse/HBASE-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068450#comment-14068450
]
Nicolas Liochon commented on HBASE-11492:
-----------------------------------------
(back from vacation, sorry for the delay in responding,
[[email protected]])
The gap between 0.98 and master patch is because the master patch was limited
to the rpcServer while you covered all ipc* in the 0.98 version.
I've done a grep on "ipc." in the code.
v3 of the master patch includes the 0.98 ones, namely:
- ipc.server.callqueue.read.share (not used in hadoop)
- ipc.server.callqueue.handler.factor (not used in hadoop)
- ipc.server.callqueue.type (not used in hadoop nor in hbase 0.98)
- ipc.server.queue.max.call.delay (not used in hadoop nor in hbase 0.98)
- ipc.server.max.callqueue.length (not used in hadoop)
- ipc.server.scan.vtime.weight (not used in hadoop nor in hbase 0.98)
The same grep on 0.98 shows:
- ipc.ping.interval (feature removed on master, used in hadoop but not in the
config files)
- ipc.socket.timeout (not used in hadoop, replaced by connect/read/write
settings on master)
0.98.addendum contains this, with the same double read (new name / old name)
strategy.
master v3 contains the whole patch
I think there is another issue with the 0.98 patch: as we don't set the value
for tcpnodelay in our config files, we're using the value from hadoop common so
our default for tcpnodelay is still 'false' I think. We don't have this in
master as we don't have the double read strategy.
> Hadoop configuration overrides some ipc parameters including tcpNoDelay
> -----------------------------------------------------------------------
>
> Key: HBASE-11492
> URL: https://issues.apache.org/jira/browse/HBASE-11492
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.98.0, 0.99.0
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Priority: Critical
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: 11492.v1.patch, 11492.v1.withp1.patch,
> 11492.v2-0.98.patch, 11492.v2.patch, 11492.v2.patch
>
>
> There is an option to set tcpNoDelay, defaulted to true, but the socket
> channel is actually not changed. As a consequence, the server works with
> nagle enabled. This leads to very degraded behavior when a single connection
> is shared between threads. We enter into conflicts with nagle and tcp delayed
> ack.
> Here is an example of performance with the PE tool plus HBASE-11491:
> {noformat}
> oneCon #client sleep exeTime (seconds)
> avg latency, sleep excluded (microseconds)
> true 1 0 31
> 310
> false 1 0 31
> 310
> true 2 0 50
> 500
> false 2 0 31
> 310
> true 2 5 488 (including 200s sleeping)
> 2880
> false 2 5 246 (including 200s sleeping)
> 460
> {noformat}
> The latency is multiple by 5 (2880 vs 460) when the connection is shared.
> This is the delayed ack kicking in. This can be fixed by really using tcp no
> delay.
> Any application sharing the tcp connection between threads has the issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)