Hi Aleksei,
Are you trying to solve this in principle or do you have a concrete
problem at hand which triggers this deadlock? If it is the later, then
some rearrangement of code might do the trick... For example, native
libraries are typically loaded by a class initializer of some class that
is guaranteed to be initialized before the 1st invocation of a native
method from such library. But if such class can also be loaded and
initialized by some other trigger, deadlock can occur. Best remedy for
such situation is to move all native methods to a special class that
serves just for interfacing with native code and also contains an
initializer that loads the native library and nothing else. Such
arrangement would ensure that the order of taking locks is always the
same: classLoadingLock -> nativeLibraryLock ...
Regards, Peter
On 5/20/21 12:31 AM, David Holmes wrote:
On 20/05/2021 2:29 am, Aleksei Voitylov wrote:
On Wed, 19 May 2021 16:21:41 GMT, Aleksei Voitylov
<avoity...@openjdk.org> wrote:
Please review this PR which fixes the deadlock in ClassLoader
between the two lock objects - a lock object associated with the
class being loaded, and the ClassLoader.loadedLibraryNames hash
map, locked during the native library load operation.
Problem being fixed:
The initial reproducer demonstrated a deadlock between the
JarFile/ZipFile and the hash map. That deadlock exists even when
the ZipFile/JarFile lock is removed because there's another lock
object in the class loader, associated with the name of the class
being loaded. Such objects are stored in
ClassLoader.parallelLockMap. The deadlock occurs when JNI_OnLoad()
loads exactly the same class, whose signature is being verified in
another thread.
Proposed fix:
The proposed patch suggests to get rid of locking
loadedLibraryNames hash map and synchronize on each entry name, as
it's done with class names in see
ClassLoader.getClassLoadingLock(name) method.
The patch introduces nativeLibraryLockMap which holds the lock
objects for each library name, and the getNativeLibraryLock()
private method is used to lazily initialize the corresponding lock
object. nativeLibraryContext was changed to ThreadLocal, so that in
any concurrent thread it would have a NativeLibrary object on top
of the stack, that's being currently loaded/unloaded in that
thread. nativeLibraryLockMap accumulates the names of all native
libraries loaded - in line with class loading code, it is not
explicitly cleared.
Testing: jtreg and jck testing with no regressions. A new
regression test was developed.
Aleksei Voitylov has updated the pull request incrementally with one
additional commit since the last revision:
address review comments, add tests
Dear colleagues,
The updated PR addresses review comment regarding ThreadLocal as well
as David' concern around the lock being held during
JNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects are
deallocated. Multiple threads are allowed to enter
NativeLibrary.load() to prevent any thread from locking while another
thread loads a library. Before the update, there could be a class
loading lock held by a parallel capable class loader, which can
deadlock with the library loading lock. As proposed by David Holmes,
the library loading lock was removed because dlopen/LoadLibrary are
thread safe and they maintain internal reference counters on
libraries. There's still a lock being held while a pair of containers
are read/updated. It's not going to deadlock as there's no lock/wait
operation performed while that lock is held. Multiple threads may
create their own copies of NativeLibrary object and register it for
auto unloading.
Tests for auto unloading were added along with the PR update. There
are now 3 jtreg tests:
- one checks for deadlock, similar to the one proposed by Chris Hegarty
- two other tests are for library unload.
The major side effect of that multiple threads are allowed to enter
is that JNI_OnLoad/JNI_OnUnload may be called multiple (but same)
number of times from concurrent threads. In particular, the number of
calls to JNI_OnLoad must be equal to the number of calls to
JNI_OnUnload after the relevant class loader is garbage collected.
This may affect the behaviour that relies on specific order or the
number of JNI_OnLoad/JNI_OnUnload calls. The current JNI
specification does not mandate how many times JNI_OnLoad/JNI_OnUnload
are called. Also, we could not locate tests in jck/jtreg/vmTestbase
that would rely on the specific order or number of calls to
JNI_OnLoad/JNI_OnUnload.
But you can't make such a change! That was my point. To fix the
deadlock we must not hold a lock. But we must ensure only a single
call to JNI_OnLoad is possible. It is an unsolvable problem with those
constraints. You can't just change the behaviour of JNI_OnLoad like that.
David
-----
If this is really a problem that several people are facing, then perhaps
a change in the API could solve it. I'm thinking
Thank you Alan Bateman, David Holmes and Chris Hegarty for your
valuable input.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3976