[
https://issues.apache.org/jira/browse/HBASE-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068593#comment-14068593
]
Andrew Purtell commented on HBASE-11492:
----------------------------------------
+1 on the addendum for 0.98
bq. I think there is another issue with the 0.98 patch: as we don't set the
value for tcpnodelay in our config files, we're using the value from hadoop
common so our default for tcpnodelay is still 'false'
I don't think it is an issue if the default we use is the Hadoop default.
That's the intent of the differences between the 0.98 patch and changes to
later versions. It may not be what we'd want for HBase production settings, but
is a conservative choice to make in general. On the other hand, if people are
fine with setting the default 'nodelay' to true in 0.98, I'd be happy to file a
follow up JIRA and make that change.
> Hadoop configuration overrides some ipc parameters including tcpNoDelay
> -----------------------------------------------------------------------
>
> Key: HBASE-11492
> URL: https://issues.apache.org/jira/browse/HBASE-11492
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.98.0, 0.99.0
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Priority: Critical
> Fix For: 0.99.0, 0.98.4, 2.0.0
>
> Attachments: 11492.098.addendum.patch, 11492.v1.patch,
> 11492.v1.withp1.patch, 11492.v2-0.98.patch, 11492.v2.patch, 11492.v2.patch,
> 11492.v3.patch
>
>
> There is an option to set tcpNoDelay, defaulted to true, but the socket
> channel is actually not changed. As a consequence, the server works with
> nagle enabled. This leads to very degraded behavior when a single connection
> is shared between threads. We enter into conflicts with nagle and tcp delayed
> ack.
> Here is an example of performance with the PE tool plus HBASE-11491:
> {noformat}
> oneCon #client sleep exeTime (seconds)
> avg latency, sleep excluded (microseconds)
> true 1 0 31
> 310
> false 1 0 31
> 310
> true 2 0 50
> 500
> false 2 0 31
> 310
> true 2 5 488 (including 200s sleeping)
> 2880
> false 2 5 246 (including 200s sleeping)
> 460
> {noformat}
> The latency is multiple by 5 (2880 vs 460) when the connection is shared.
> This is the delayed ack kicking in. This can be fixed by really using tcp no
> delay.
> Any application sharing the tcp connection between threads has the issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)