[jira] [Comment Edited] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

JIRA Wed, 10 Apr 2019 16:15:15 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814930#comment-16814930
 ]


Íñigo Goiri edited comment on HADOOP-13144 at 4/10/19 11:14 PM:
----------------------------------------------------------------

Thanks [~vinayrpet] for the comments.
>From my experiments, the Namenode is not the main issue and it can absorb the 
>load as it has enough readers and handlers.
Security is also disabled.

I added  [^HADOOP-13144-performance.patch] with the whole setup.
It also includes the full proposal including the Router changes.

I have to say the results are not as large as I remember and very sensitive to 
the order.
After running 20 times and excluding warm up times, we can see:
# A single Router using the old approach (single connection pool): 483.5 ms
# A single Router using multiple users: 455.2 ms
# A single Router using the new approach (multiple connections): 458.5 ms.
# Multiple Routers using the old approach: 483.5 ms.

One can see that 2 and 3 are pretty similar as one would expect from the 
analysis.
I'm a little surprised with number 4 though.

These results are not as spectacular as what I observed in HDFS-14316.
My guess is when everything is fine, there are no much difference.
However, it gets much worse when there are time outs.
I think in that case, Routers are stuck at creating connections and that's when 
one sees the worst impact of having just one socket factory.



was (Author: elgoiri):
Thanks [~vinayrpet] for the comments.
>From my experiments, the Namenode is not the main issue and it can absorb the 
>load as it has enough readers and handlers.
Security is also disabled.

I added  [^HADOOP-13144-performance.patch] with the whole setup.
It also includes the full proposal including the Router changes.

I have to say the results are not as large as I remember and very sensitive to 
the order.
After running 20 times and excluding warm up times, we can see:
# A single Router using the old approach (single connection pool): 483.5 ms
# A single Router using multiple users: 455.2 ms
# A single Router using the new approach (multiple connections): 458.5 ms.
# Multiple Routers using the old approach: 483.5 ms.

One can see that 2 and 3 are pretty similar as one would expect from the 
analysis.
I'm a little surprised with number 4 though.

These results are not as spectacular as what I observed in HDFS-14316.
My guess is when everything is fine, there are no much different but it gets 
much worse when there are time outs.
I think in that case, Routers are stuck at creating connections and that's when 
one sees the worst impact of having just one socket factory.


> Enhancing IPC client throughput via multiple connections per user
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13144
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Jason Kace
>            Assignee: Íñigo Goiri
>            Priority: Minor
>         Attachments: HADOOP-13144-performance.patch, HADOOP-13144.000.patch, 
> HADOOP-13144.001.patch, HADOOP-13144.002.patch, HADOOP-13144.003.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

Reply via email to