[
https://issues.apache.org/jira/browse/HADOOP-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Walter Su updated HADOOP-12475:
-------------------------------
Attachment: HADOOP-12475.02.patch
1. Connection leak means an opened connection which forgets to close. It
doesn't happen here. Connection is opened at {{setupIOstreams}}. And it will be
closed even it's not in the cache.
2. It's true a race condition can cause creating multiple connections for the
same remoteId. I replace it with {{putIfAbsent}}. Thanks [~Apache9], [~sjlee0]
3. There is another race condition about removing. It's caused by HADOOP-11772.
Before HADOOP-11772, Client used {{HashTable}} for caching. The removing logic
is
{code}
1175 synchronized (connections) {
1176 if (connections.get(remoteId) == this) {
1177 connections.remove(remoteId);
1178 }
1179 }
{code}
It bothers me a while why a thread-safe HashTable need {{synchronized}}. Then I
know this connection is closed, should be removed. But other thread could have
already known this closedConnection, and replace it with a new connection.
I use conditional remove {{ConcurrentHashMap.remove(key,value)}} to address
this.
I think before HADOOP-11772, {{synchronized (connections)}} (which locks the
whole table and HashTable also locks whole table) is the reason why the
invokers choke up.
Uploaded 02 patch.
> Replace guava Cache with ConcurrentHashMap for caching Connection in ipc
> Client
> -------------------------------------------------------------------------------
>
> Key: HADOOP-12475
> URL: https://issues.apache.org/jira/browse/HADOOP-12475
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: conf, io, ipc
> Reporter: Walter Su
> Assignee: Walter Su
> Attachments: HADOOP-12475.01.patch, HADOOP-12475.02.patch
>
>
> quote [~daryn] from HADOOP-11772:
> {quote}
> CacheBuilder is obscenely expensive for concurrent map, and it requires
> generating unnecessary garbage even just to look up a key. Replace it with
> ConcurrentHashMap.
> I identified this issue that impaired my own perf testing under load. The
> slowdown isn't just the sync. It's the expensive of Connection's ctor
> stalling other connections. The expensive of ConnectionId#equals causes
> delays. Synch'ing on connections causes unfair contention unlike a sync'ed
> method. Concurrency simply hides this.
> {quote}
> BTW, guava Cache is heavyweight. Per local test, ConcurrentHashMap has better
> overal performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)