[
https://issues.apache.org/jira/browse/TEZ-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041949#comment-17041949
]
Jonathan Turner Eagles commented on TEZ-4090:
---------------------------------------------
This is a matter of policy and is not strictly better to use the
synchronizedMap.
If the policy is to improve performance, we could implement a synchronizedMap
and reduce the lock held time if we assume mostly cache hits (both present and
not null). The we return values without the lock held, reducing contention. In
the case of a cache miss, there is need for a double lock (cache.get() and
cache.put()). If the policy is to reduce memory footprint, synchronizedMap
increases the chance that multiple copies of the value are persisted in memory.
My other assumption about this code is that most access are happening serially
in the event thread (single threaded by default) and that the most important
factor is overall speed of the function and not contention. I need to
re-familiarize myself with these assumptions to make a good policy judgement.
{code}
synchronized T getInstance(final T id) {
final WeakReference<T> cached = cache.get(id);
if (cached != null) {
final T value = cached.get();
if (value != null)
return value;
}
cache.put(id, new WeakReference<T>(id));
return id;
}
{code}
> TezIDCache::getInstance sync lock
> ---------------------------------
>
> Key: TEZ-4090
> URL: https://issues.apache.org/jira/browse/TEZ-4090
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: László Bodor
> Priority: Minor
> Attachments: tez-task-id-concurrency.png
>
>
> profiler output shared by [~gopalv].
> Looking at the code, it is not mandatory to have sync on getInstance.
> https://github.com/apache/tez/blob/master/tez-common/src/main/java/org/apache/tez/dag/records/TezID.java#L48
--
This message was sent by Atlassian Jira
(v8.3.4#803005)