[ 
https://issues.apache.org/jira/browse/IGNITE-27871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061881#comment-18061881
 ] 

Ignite TC Bot commented on IGNITE-27871:
----------------------------------------

{panel:title=Branch: [pull/12760/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/12760/head] Base: [master] : New Tests 
(1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Basic 1{color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8891118]]
* {color:#013220}IgniteBasicTestSuite: 
GridDeploymentLocalStoreReuseTest.testNoExcessiveLocalDeploymentCacheMisses - 
PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=8871073&buildTypeId=IgniteTests24Java8_RunAll]

> High latency for locally deployed tasks when peerClassLoadingEnabled=true 
> (deployment lookup contention)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-27871
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27871
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Oleg Valuyskiy
>            Assignee: Oleg Valuyskiy
>            Priority: Major
>              Labels: ise
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> h1. Impact
> Users observe high latency under load when executing Compute tasks that are 
> present in node classpath while {*}peerClassLoadingEnabled=true{*}. The issue 
> is amplified in multi-node clusters and with concurrent executions (thin 
> client calling task by name).
> h1. Root causes (2 related parts)
> *#1*
> Local lookup misses cache: local deployment metadata is created without 
> classLoader/classLoaderId, so *GridDeploymentLocalStore#deployment(meta)* 
> can’t match cached deployments and execution repeatedly falls back to 
> *GridDeploymentLocalStore#deploy()* even when deployment already exists.
> *#2*
> Contention in *deploy()* under load: *GridDeploymentLocalStore#deploy()* is 
> synchronized and, in the common reuse scenario, performs an O(N) scan over 
> *cache.values()* to locate an existing deployment by ClassLoader. Under high 
> concurrency this leads to lock contention.
> h1. Behaviour
>  * *Expected:* Once a task is available locally, subsequent executions should 
> reuse cached deployment with minimal synchronization overhead.
>  * *Actual:* Repeated fallback to synchronized *deploy()* and expensive 
> scanning causes contention and high latency.
> h1. Proposed fix
> *#1*
> Ensure local deployment lookup metadata includes enough information (class 
> loader and/or loader id) to allow cache hits for locally available tasks.
> *#2*
> Optimize *GridDeploymentLocalStore#deploy()* reuse path by using a 
> ClassLoader -> deployment/deps index (O(1)) instead of scanning 
> *cache.values()* under mux (O(N)).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to