[
https://issues.apache.org/jira/browse/THRIFT-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jens Geyer closed THRIFT-5430.
------------------------------
> FieldMetaData synchronized method can trigger deadlock during static class
> initialization in JVM native code
> ------------------------------------------------------------------------------------------------------------
>
> Key: THRIFT-5430
> URL: https://issues.apache.org/jira/browse/THRIFT-5430
> Project: Thrift
> Issue Type: Bug
> Components: Java - Library
> Affects Versions: 0.6, 0.6.1, 0.7, 0.8, 0.9, 0.9.1, 0.9.2, 0.9.3, 0.9.3.1,
> 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 0.14.1
> Environment: AdoptJDK 1.8 (Java 8) ; Linux, AWS machines.
> Deadlock is environment independent and occurs in conformance with the JRE 8
> spec.
>
> Reporter: Divye Kapoor
> Assignee: Divye Kapoor
> Priority: Major
> Fix For: 0.15.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> We (Pinterest) hit the following deadlock during JVM classloading of Thrift
> classes. The root cause was triggering sClass.newInstance() while holding the
> synchronized lock on FieldMetaData.class::getStructMetaDataMap(..)
> Here's the stacktraces of interest:
> Thread 1:
> {code:java}
> Stack Trace is:
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) //
> stuck here for > 30 mins
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.lang.Class.newInstance(Class.java:442) // creating an object via
> reflection.
> at
> org.apache.thrift.meta_data.FieldMetaData.getStructMetaDataMap(FieldMetaData.java:61)-
> locked java.lang.Class@346242bb // Lock held on FieldMetaData.class
> at
> com.pinterest.commons.thrift.ThriftStructDescriptor.<init>(ThriftStructDescriptor.java:56)
> at
> com.pinterest.commons.thrift.ThriftStructDescriptor.getDescriptor(ThriftStructDescriptor.java:38)
> {code}
> Thread 2:
> {code:java}
> Stack Trace is:
> at
> org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap(FieldMetaData.java:49)
> - blocked on java.lang.Class@346242bb // Lock held by Thread 1.
> at
> com.pinterest.schemas.search.query.NavboostScore.<clinit>(NavboostScore.java:146)
> at
> com.pinterest.schemas.search.query.NavboostScores.read(NavboostScores.java:334)
> at
> com.pinterest.schemas.search.query.SupplementaryTerm.read(SupplementaryTerm.java:1209)
> at
> com.pinterest.schemas.search.query.SupplementaryTerms.read(SupplementaryTerms.java:335)
> at com.pinterest.schemas.search.query.QueryJoin.read(QueryJoin.java:6514)
> at
> com.pinterest.pinpin.thrift.SerializedResourceContainer.read(SerializedResourceContainer.java:2867)
> at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:69) //
> General Thrift deserialize. {code}
> Here's the code of interest:
> [https://github.com/apache/thrift/blob/65fb49bb41f852375b278c9057d52c9472f0cb3a/lib/java/src/org/apache/thrift/meta_data/FieldMetaData.java#L61]
>
> Thread 1 has the following lock acquisition order:
> {noformat}
> FieldMetaData.class::getStructMetaDataMap
> -> lock FieldMetaData.class (LOCK 1)
> -> sClass.newInstance()
> -> load sClass
> -> JVM init_lock (native) lock and release. (LOCK 2)
> -> waiting for Thread 2 to complete static initialization and notify
> thread 1.{noformat}
> Thread 2 has the following lock acquisition order:
> {noformat}
> sClass.newInstance() (from new ThriftObject() / deserialization)
> -> load sClass
> -> static initialization
> -> try locking FieldMetaData.class when calling
> FieldMetaData.addStructMetaDataMap (WAIT on LOCK 1 which is never released)
> ---> does not ever trigger the next step:
> ---> Notify Thread 1 that static initialization of the class has
> completed (releasing LOCK 2 on Thread 1 which will eventually release LOCK 1
> by returning from native code).{noformat}
> Internally, this was a fairly detailed investigation. Would be great if we
> agree on a Fix approach. This deadlock affects all versions of Thrift since
> 0.9 which introduced the synchronized keyword. Versions prior to 0.9 are
> simply thread unsafe (because there's no lock on FieldMetaData's internal
> HashMap and the two static method calls can race). I didn't check versions
> below 0.5 but probably this extends all the way back.
> This is not an issue in fbThrift, which correctly uses a ConcurrentHashMap
> and does not take a lock on FieldMetaData. See this code here:
> [https://github.com/facebook/fbthrift/blob/c0f8ffb345d847df512ae9c404a58379fcd51cb9/thrift/lib/java/src/main/java/com/facebook/thrift/meta_data/FieldMetaData.java]
> vs the buggy code here:
> [https://github.com/apache/thrift/blob/v0.14.1/lib/java/src/org/apache/thrift/meta_data/FieldMetaData.java]
>
> Another alternative approach is to use
> {code:java}
> synchronized (structMap} {
> ...
> } {code}
> instead of the synchronized keyword on the methods.
> Please confirm which approach is preferred and we can send a PR on Github.
>
> Here's the original deadlock hypothesis as outlined by Jiahuan Liu:
> The hypothesis of the deadlock:
> ThriftStructDescriptor.get is used in many places for deserialization and it
> calls FieldMetaData.getStructMetaDataMap. inside this method, it tries to
> load the thrift class if the thrift has not been loaded yet. thrift classes
> call FieldMetaData.addStructMetaDataMap in a static block. so the deadlock
> may happen when:
> * thread A is trying to deserialize -> getStructMetaDataMap -> class loading
> * if thread B is already trying to load the same thrift, it requires
> addStructMetaDataMap to complete but this method is blocked by thread A
> getStructMetaDataMap
> * the class loading in thread A will be blocked by thread B because class
> loading is also synchronized by java. see this section in java spec:
> If the Class object for C indicates that initialization is in progress for C
> by some other thread, then release LC and block the current thread until
> informed that the in-progress initialization has completed, at which time
> repeat this step.
> Here's the classloading spec in java 8:
> [https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.2]
> Thread 1 is stuck in step 2 of the initialization (waiting for Thread 2 to
> notify in Step 10).
> Thread 2 is stuck in step 9 of the initialization and never makes it to step
> 10.
> Precondition: this is the first object creation for that Thrift object and
> at-least 2 threads are racing to create the object, one of which holds a lock
> on FieldMetaData.class.
> Please reach out if this is unclear. This hypothesis was validated by reading
> through the native JVM code on the class load -> static_initialization path.
> Here's the native JVM code if you're interested:
> [InstanceKlass::initialize_impl|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/f0eac5a3aff7d413e2ef0532d7464a58d7a15237/hotspot/src/share/vm/oops/instanceKlass.cpp#L833]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)