[ 
https://issues.apache.org/jira/browse/HADOOP-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545934#comment-14545934
 ] 

Gopal V commented on HADOOP-11772:
----------------------------------

bq. The RPC client will send out the request asynchronously.

Asynchronously is what it does - so it does not fail even without this patch.

The problem is that it takes 200-300ms to send it out, by which time another 
IPC update has already queued up for the same connection.

See the two threads locked against each other in the bug report, where one is 
doing a NameNode operation and another is doing an ApplicationMaster update - 
which need never lock against each other in reality.

Because they both use the same {{ipc.Client}} singleton.

If you want to revisit this fix, please remove the Client singleton or find 
another way to remove the synchronization barrier around the getConnection() & 
the way it prevents reopening connections for IPC.

The current IPC implementation works asynchronously, but is too slow to keep up 
with sub-second performance on a multi-threaded daemon which uses a singleton 
locked object for 24 cores doing everything (namenode lookups, app master 
heartbeats, data movement events, statistic updates, error recovery).

> RPC Invoker relies on static ClientCache which has synchronized(this) blocks
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-11772
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11772
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc, performance
>            Reporter: Gopal V
>            Assignee: Akira AJISAKA
>              Labels: BB2015-05-RFC
>         Attachments: HADOOP-11772-001.patch, HADOOP-11772-002.patch, 
> HADOOP-11772-003.patch, HADOOP-11772-wip-001.patch, 
> HADOOP-11772-wip-002.patch, after-ipc-fix.png, dfs-sync-ipc.png, 
> sync-client-bt.png, sync-client-threads.png
>
>
> {code}
>   private static ClientCache CLIENTS=new ClientCache();
> ...
>     this.client = CLIENTS.getClient(conf, factory);
> {code}
> Meanwhile in ClientCache
> {code}
> public synchronized Client getClient(Configuration conf,
>       SocketFactory factory, Class<? extends Writable> valueClass) {
> ...
>    Client client = clients.get(factory);
>     if (client == null) {
>       client = new Client(valueClass, conf, factory);
>       clients.put(factory, client);
>     } else {
>       client.incCount();
>     }
> {code}
> All invokers end up calling these methods, resulting in IPC clients choking 
> up.
> !sync-client-threads.png!
> !sync-client-bt.png!
> !dfs-sync-ipc.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to