[
https://issues.apache.org/jira/browse/HADOOP-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546323#comment-14546323
]
Gopal V commented on HADOOP-11772:
----------------------------------
bq. reproduce the problem is to spawn a client that talks to 200 nodes
concurrently, but unfortunately I don't have the access of the cluster nor
YourKit.
The problem was reported as being visible on 1 process when it talks to 1
NameNode. You do not need 200 nodes to reproduce this bug - I reported this as
observed using 1 single process and 1 namenode instance (not even HA).
I got my yourkit license for use with Apache Hive for free - see section (G) of
their license and email their sales folks to get a free license.
Those arguments aside, the earlier patch had a unit test - the
testClientCacheFromMultiThreads() that [~ajisakaa] wrote, when you run that
does that show blocked threads or de-scheduled threads with the new patch?
This is an important fix late in the cycle, the new patch should get as much
testing as early as possible.
> RPC Invoker relies on static ClientCache which has synchronized(this) blocks
> ----------------------------------------------------------------------------
>
> Key: HADOOP-11772
> URL: https://issues.apache.org/jira/browse/HADOOP-11772
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: ipc, performance
> Reporter: Gopal V
> Assignee: Akira AJISAKA
> Labels: BB2015-05-RFC
> Attachments: HADOOP-11772-001.patch, HADOOP-11772-002.patch,
> HADOOP-11772-003.patch, HADOOP-11772-wip-001.patch,
> HADOOP-11772-wip-002.patch, HADOOP-11772.004.patch, after-ipc-fix.png,
> dfs-sync-ipc.png, sync-client-bt.png, sync-client-threads.png
>
>
> {code}
> private static ClientCache CLIENTS=new ClientCache();
> ...
> this.client = CLIENTS.getClient(conf, factory);
> {code}
> Meanwhile in ClientCache
> {code}
> public synchronized Client getClient(Configuration conf,
> SocketFactory factory, Class<? extends Writable> valueClass) {
> ...
> Client client = clients.get(factory);
> if (client == null) {
> client = new Client(valueClass, conf, factory);
> clients.put(factory, client);
> } else {
> client.incCount();
> }
> {code}
> All invokers end up calling these methods, resulting in IPC clients choking
> up.
> !sync-client-threads.png!
> !sync-client-bt.png!
> !dfs-sync-ipc.png!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)