[
https://issues.apache.org/jira/browse/IGNITE-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maxim Muzafarov updated IGNITE-10418:
-------------------------------------
Fix Version/s: (was: 2.8)
> Implement lightweight profiling of messages processing
> ------------------------------------------------------
>
> Key: IGNITE-10418
> URL: https://issues.apache.org/jira/browse/IGNITE-10418
> Project: Ignite
> Issue Type: New Feature
> Reporter: Alexei Scherbakov
> Assignee: Denis Chudov
> Priority: Major
>
> There is a lack of capabilities to identify bottlenecks without extensive
> profiling on server and client side (JFR recording, sampling profilers,
> regular thread dumps, etc), which is not always possible. Even having
> profiling data not always helpful for determining several types of
> bottlenecks, for example, if there is a contention on single key/partition.
> Lightweight message profiling will allow to track each message execution, to
> collect a statistics of execution in executors for each grid node and for all
> nodes, collect histograms distributed by waiting/execution time for each type
> of message.
> We need to implement:
> # histogram metrics for message execution time, queue waiting time, queue
> size at the moments of queue add and execution start, with distribution by
> message type;
> # Dumping of messages if it’s execution/waiting time exceeds some threshold
> timeout, i.e.
> {code:java}
> Slow message: *enqueueTs*=2018-11-27 15:10:22.241, *waitTime*=0.048,
> *procTime*=305.186, *messageId*=3a3064a9, *queueSzBefore*=0,
> *headMessageId*=null, *queueSzAfter*=0, *message*=GridNearTxFinishRequest
> [miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest
> [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
> futId=199a3155761-f379f312-ad4b-4181-acc5-0aacb3391f07, threadId=296,
> commitVer=null, invalidate=false, commit=true, baseVer=null, txSize=0,
> sys=false, plc=2, subjId=dda703a0-69ee-47cf-9b9a-bf3dc9309feb,
> taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=IgniteTxStateImpl
> [activeCacheIds=[644280847], recovery=false, mvccEnabled=false, txMap=HashSet
> [IgniteTxEntry [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true],
> cacheId=644280847, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=8, val=8,
> hasValBytes=true], cacheId=644280847], val=[op=READ, val=null],
> prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
> entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null,
> explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [],
> filtersPassed=false, filtersSet=false, entry=GridCacheMapEntry
> [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], val=null,
> ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=8,
> extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion
> [topVer=2147483647, order=0, nodeOrder=0]], flags=2]GridDistributedCacheEntry
> [super=]GridDhtCacheEntry [rdrs=ReaderId[] [], part=8, super=], prepared=0,
> locked=false, nodeId=null, locMapped=false, expiryPlc=null,
> transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null,
> xidVer=GridCacheVersion{code}
> # JMX tools and command line interface to get this metrics and print
> statistics view.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)