[ 
https://issues.apache.org/jira/browse/HADOOP-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546323#comment-14546323
 ] 

Gopal V commented on HADOOP-11772:
----------------------------------

bq. reproduce the problem is to spawn a client that talks to 200 nodes 
concurrently, but unfortunately I don't have the access of the cluster nor 
YourKit.

The problem was reported as being visible on 1 process when it talks to 1 
NameNode. You do not need 200 nodes to reproduce this bug - I reported this as 
observed using 1 single process and 1 namenode instance (not even HA).

I got my yourkit license for use with Apache Hive for free - see section (G) of 
their license and email their sales folks to get a free license.

Those arguments aside, the earlier patch had a unit test - the 
testClientCacheFromMultiThreads() that [~ajisakaa] wrote, when you run that 
does that show blocked threads or de-scheduled threads with the new patch?

This is an important fix late in the cycle, the new patch should get as much 
testing as early as possible.

> RPC Invoker relies on static ClientCache which has synchronized(this) blocks
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-11772
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11772
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc, performance
>            Reporter: Gopal V
>            Assignee: Akira AJISAKA
>              Labels: BB2015-05-RFC
>         Attachments: HADOOP-11772-001.patch, HADOOP-11772-002.patch, 
> HADOOP-11772-003.patch, HADOOP-11772-wip-001.patch, 
> HADOOP-11772-wip-002.patch, HADOOP-11772.004.patch, after-ipc-fix.png, 
> dfs-sync-ipc.png, sync-client-bt.png, sync-client-threads.png
>
>
> {code}
>   private static ClientCache CLIENTS=new ClientCache();
> ...
>     this.client = CLIENTS.getClient(conf, factory);
> {code}
> Meanwhile in ClientCache
> {code}
> public synchronized Client getClient(Configuration conf,
>       SocketFactory factory, Class<? extends Writable> valueClass) {
> ...
>    Client client = clients.get(factory);
>     if (client == null) {
>       client = new Client(valueClass, conf, factory);
>       clients.put(factory, client);
>     } else {
>       client.incCount();
>     }
> {code}
> All invokers end up calling these methods, resulting in IPC clients choking 
> up.
> !sync-client-threads.png!
> !sync-client-bt.png!
> !dfs-sync-ipc.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to