[ 
https://issues.apache.org/jira/browse/SLING-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Hoh updated SLING-12473:
------------------------------
    Description: 
This is a follow-up of SLING-12344.

Even with the improvements added by SLING-12344 I see these stacktraces, 
especially when an instance is just starting up.
{noformat}
 at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x0000000469fbac10> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
        at 
org.apache.sling.scripting.sightly.impl.utils.ScriptDependencyResolver.resolveScript(ScriptDependencyResolver.java:127)
        at 
org.apache.sling.scripting.sightly.impl.engine.extension.use.RenderUnitProvider.provide(RenderUnitProvider.java:95)
        at 
org.apache.sling.scripting.sightly.impl.engine.extension.use.UseRuntimeExtension.call(UseRuntimeExtension.java:71)
        at 
org.apache.sling.scripting.sightly.impl.engine.runtime.RenderContextImpl.call(RenderContextImpl.java:72)
{noformat}

It seems to me that the current code always acquires a read lock when entering 
the method. And that whenever one thread holds the write lock tp update the 
cache, all threads invoking the resolveScript method get blocked until the 
write lock is released. And this happens even for requests which would get a 
cache hit.
For that reason as long as entries are added to this cache at a high frequency, 
threads invoking ScriptDependencyResolver.resolveScript() have a high chance of 
being blocked by this.

Possible mitigations:
* Disable the caching by setting the ScriptResolutionCacheSize in the HTL 
Engine to a value less than 1024; this can be used as workaround.
* refactor the code, so that cache hits can be served without acquiring the 
read lock.
* refactor the code to use a ConcurrentHashMap (as [~cziegeler] already 
suggested in the context SLING-12344, 
[Link|https://github.com/apache/sling-org-apache-sling-scripting-sightly/pull/26#issuecomment-2209407602])


Note: SLING-12471 is unrelated to this specific problem!


  was:
This is a follow-up of SLING-12344.

Even with the improvements added by SLING-12344 I see these stacktraces, 
especially when an instance is just starting up.
{noformat}
 at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x0000000469fbac10> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
        at 
org.apache.sling.scripting.sightly.impl.utils.ScriptDependencyResolver.resolveScript(ScriptDependencyResolver.java:127)
        at 
org.apache.sling.scripting.sightly.impl.engine.extension.use.RenderUnitProvider.provide(RenderUnitProvider.java:95)
        at 
org.apache.sling.scripting.sightly.impl.engine.extension.use.UseRuntimeExtension.call(UseRuntimeExtension.java:71)
        at 
org.apache.sling.scripting.sightly.impl.engine.runtime.RenderContextImpl.call(RenderContextImpl.java:72)
{noformat}

It seems to me that the current code already acquires a read lock when entering 
the method. And that whenever one thread holds the write lock, all threads 
invoking this method blocked until the write lock is released. And this happens 
even for requests which would get a cache hit.
For that reason as long as entries are added to this cache at a high frequency, 
threads invoking ScriptDependencyResolver.resolveScript() have a high chance of 
being blocked by this.

Possible mitigations:
* Disable the caching by setting the ScriptResolutionCacheSize in the HTL 
Engine to a value less than 1024; this can be used as workaround.
* refactor the code, so that cache hits can be served without acquiring the 
read lock.
* refactor the code to use a ConcurrentHashMap (as [~cziegeler] already 
suggested in the context SLING-12344, 
[Link|https://github.com/apache/sling-org-apache-sling-scripting-sightly/pull/26#issuecomment-2209407602])


Note: SLING-12471 is unrelated to this specific problem!



> Lock contention in ScriptDependencyResolver
> -------------------------------------------
>
>                 Key: SLING-12473
>                 URL: https://issues.apache.org/jira/browse/SLING-12473
>             Project: Sling
>          Issue Type: Improvement
>          Components: HTL
>    Affects Versions: Scripting HTL Engine 1.4.24-1.4.0
>            Reporter: Joerg Hoh
>            Priority: Major
>
> This is a follow-up of SLING-12344.
> Even with the improvements added by SLING-12344 I see these stacktraces, 
> especially when an instance is just starting up.
> {noformat}
>  at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
>         - parking to wait for  <0x0000000469fbac10> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>         at 
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
>         at 
> org.apache.sling.scripting.sightly.impl.utils.ScriptDependencyResolver.resolveScript(ScriptDependencyResolver.java:127)
>         at 
> org.apache.sling.scripting.sightly.impl.engine.extension.use.RenderUnitProvider.provide(RenderUnitProvider.java:95)
>         at 
> org.apache.sling.scripting.sightly.impl.engine.extension.use.UseRuntimeExtension.call(UseRuntimeExtension.java:71)
>         at 
> org.apache.sling.scripting.sightly.impl.engine.runtime.RenderContextImpl.call(RenderContextImpl.java:72)
> {noformat}
> It seems to me that the current code always acquires a read lock when 
> entering the method. And that whenever one thread holds the write lock tp 
> update the cache, all threads invoking the resolveScript method get blocked 
> until the write lock is released. And this happens even for requests which 
> would get a cache hit.
> For that reason as long as entries are added to this cache at a high 
> frequency, threads invoking ScriptDependencyResolver.resolveScript() have a 
> high chance of being blocked by this.
> Possible mitigations:
> * Disable the caching by setting the ScriptResolutionCacheSize in the HTL 
> Engine to a value less than 1024; this can be used as workaround.
> * refactor the code, so that cache hits can be served without acquiring the 
> read lock.
> * refactor the code to use a ConcurrentHashMap (as [~cziegeler] already 
> suggested in the context SLING-12344, 
> [Link|https://github.com/apache/sling-org-apache-sling-scripting-sightly/pull/26#issuecomment-2209407602])
> Note: SLING-12471 is unrelated to this specific problem!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to