[
https://issues.apache.org/jira/browse/HIVE-19261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Diomin reassigned HIVE-19261:
------------------------------------
Assignee: Alexey Diomin (was: Fangshi Li)
> Avro SerDe's InstanceCache should not be synchronized on retrieve
> -----------------------------------------------------------------
>
> Key: HIVE-19261
> URL: https://issues.apache.org/jira/browse/HIVE-19261
> Project: Hive
> Issue Type: Improvement
> Reporter: Fangshi Li
> Assignee: Alexey Diomin
> Priority: Major
> Labels: pull-request-available
> Attachments: HIVE-19261.1.patch
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> In HIVE-16175, upstream made a patch to fix the thread safety issue in
> AvroSerDe's InstanceCache. This fix made the retrieve method in InstanceCache
> synchronized. While it should make InstanceCache thread-safe, making retrieve
> synchronized for the cache can be expensive in highly concurrent environment
> like Spark, as multiple threads need to be synchronized on entering the
> entire retrieve method.
> We are proposing another way to fix this thread safety issue by making the
> underlying map of InstanceCache as ConcurrentHashMap. Ideally, we can use
> atomic computeIfAbsent in the retrieve method to avoid synchronizing the
> entire method.
> While computeIfAbsent is only available on java 8 and java 7 is still
> supported in Hive,
> we use a pattern to simulate the behavior of computeIfAbsent. In the future,
> we should move to computeIfAbsent when Hive requires java 8.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)