Fangshi Li created HIVE-19261:
---------------------------------

             Summary: Avro SerDe's InstanceCache should not be synchronized on 
retrieve
                 Key: HIVE-19261
                 URL: https://issues.apache.org/jira/browse/HIVE-19261
             Project: Hive
          Issue Type: Improvement
            Reporter: Fangshi Li
            Assignee: Fangshi Li


In HIVE-16175, upstream made a patch to fix the thread safety issue in 
AvroSerDe's InstanceCache. This fix made the retrieve method in InstanceCache 
synchronized. While it should make InstanceCache thread-safe, adding 
synchronized on retrieve for the cache can be expensive in highly concurrent 
environment like Spark, as multiple threads need to be synchronized on entering 
the retrieve method.

We are proposing another way to fix this thread safety issue by making the 
underlying map of InstanceCache as ConcurrentHashMap. Ideally, we can use 
atomic computeIfAbsent in the retrieve method to avoid synchronizing the entire 
method.

While computeIfAbsent is only available on java 8 and java 7 is still supported 
in Hive,
 /we use a pattern to simulate the behavior of computeIfAbsent. In the future, 
we should move to computeIfAbsent when Hive requires java 8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to