Xinyu Liu updated SAMZA-963:
    Affects Version/s: 0.11

> Add timers to help identify performance issues with KV stores and producers.
> ----------------------------------------------------------------------------
>                 Key: SAMZA-963
>                 URL: https://issues.apache.org/jira/browse/SAMZA-963
>             Project: Samza
>          Issue Type: Improvement
>    Affects Versions: 0.11
>            Reporter: Jake Maes
>            Assignee: Fred Ji
>             Fix For: 0.11.0
>         Attachments: SAMZA-963.1.patch, SAMZA-963.2.patch
> We have good timing metrics for many of the primary actions in the event loop:
> * Choose
> ** Deserialization
> ** Poll
> * Process
> * Window
> * Commit
> I've noticed a few things while analyzing job performance at LinkedIn:
> 1. We can usually identify problems in Choose using the sub metrics for 
> Deserialization and Poll. I don't think any work needs to be done here.
> 2. Slowness in Process or Window is usually caused by business logic (e.g. 
> side calls to remote DBs), but it can also be caused by slowness (e.g. 
> "stalls" in the case of RocksDB) in the KV Store. 
> 3. Slowness in Commit can be caused by slowness flushing the stores or 
> producers. It can also come from checkpointing. 
> #2 would be better if we had timers around all the main KV Store operations, 
> including get, put, delete, and the batch operations. Then we can isolate KV 
> Store performance from business logic performance. 
> #3 would be improved if we had timers around all the flushes. Specifically, I 
> think we should add a "flush-ns" metric to the KeyValueStoreMetrics and 
> update it from each of the stores. I noticed that KafkaSystemProducerMetrics 
> has a "flush-ns" metric, so the kafka producer is covered. 
> To summarize, this ticket is to add metrics around all KV Store operations, 
> not just for user operations like get/put, but flush as well. 
> Related work: SAMZA-449

This message was sent by Atlassian JIRA

Reply via email to