[ 
https://issues.apache.org/jira/browse/SAMZA-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398321#comment-15398321
 ] 

Fred Ji commented on SAMZA-963:
-------------------------------

Discussed with [~jmaes] and [~nickpan47], we will add timer at 
KeyValueStorageEngine to capture the latency at the upper level instead of the 
lower level for each raw store. 
The metrics we are going to add include:
get,
put,
delete,
flush,
all,
range


> Add timers to help identify performance issues with KV stores and producers.
> ----------------------------------------------------------------------------
>
>                 Key: SAMZA-963
>                 URL: https://issues.apache.org/jira/browse/SAMZA-963
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Jake Maes
>            Assignee: Fred Ji
>
> We have good timing metrics for many of the primary actions in the event loop:
> * Choose
> ** Deserialization
> ** Poll
> * Process
> * Window
> * Commit
> I've noticed a few things while analyzing job performance at LinkedIn:
> 1. We can usually identify problems in Choose using the sub metrics for 
> Deserialization and Poll. I don't think any work needs to be done here.
> 2. Slowness in Process or Window is usually caused by business logic (e.g. 
> side calls to remote DBs), but it can also be caused by slowness (e.g. 
> "stalls" in the case of RocksDB) in the KV Store. 
> 3. Slowness in Commit can be caused by slowness flushing the stores or 
> producers. It can also come from checkpointing. 
> #2 would be better if we had timers around all the main KV Store operations, 
> including get, put, delete, and the batch operations. Then we can isolate KV 
> Store performance from business logic performance. 
> #3 would be improved if we had timers around all the flushes. Specifically, I 
> think we should add a "flush-ns" metric to the KeyValueStoreMetrics and 
> update it from each of the stores. I noticed that KafkaSystemProducerMetrics 
> has a "flush-ns" metric, so the kafka producer is covered. 
> To summarize, this ticket is to add metrics around all KV Store operations, 
> not just for user operations like get/put, but flush as well. 
> Related work: SAMZA-449



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to