[ 
https://issues.apache.org/jira/browse/SAMZA-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Maes updated SAMZA-1356:
-----------------------------
    Description: 
There are a couple problems that can affect our ability to troubleshoot state 
restore from changelog.

1. KeyValueStorageEngine logs a message for every 1M messages restored, but it 
doesn't print anything for smaller stores. We should add a message to report 
the final number of entries restored.

2. While the "restore-time" metric is a gauge, the KeyValueStorageEngineMetrics 
"messages-restored" and "messages-bytes" are both counters, and counters are 
often graphed in terms of deltas so the value disappears after one data point. 
Since these values only matter for the beginning of the job, we should switch 
them to gauges so the value is retained for later monitoring. 



  was:
There are a couple problems that can affect our ability to troubleshoot state 
restore from changelog.

1. KeyValueStorageEngine logs a message for every 1M messages restored, but it 
doesn't print anything for smaller stores. We should add a message to report 
the final number of entries restored.

2. While the "restore-time" metric is a gauge, the KeyValueStorageEngineMetrics 
"messages-restored" and "messages-bytes" are both counters, and counters are 
often reported in terms of deltas so the value disappears after one data point. 
Since these values only matter for the beginning of the job, we should switch 
them to gauges so the value is retained for later monitoring. 




> Improve monitoring for state restore
> ------------------------------------
>
>                 Key: SAMZA-1356
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1356
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jake Maes
>            Assignee: Jake Maes
>             Fix For: 0.13.1
>
>
> There are a couple problems that can affect our ability to troubleshoot state 
> restore from changelog.
> 1. KeyValueStorageEngine logs a message for every 1M messages restored, but 
> it doesn't print anything for smaller stores. We should add a message to 
> report the final number of entries restored.
> 2. While the "restore-time" metric is a gauge, the 
> KeyValueStorageEngineMetrics "messages-restored" and "messages-bytes" are 
> both counters, and counters are often graphed in terms of deltas so the value 
> disappears after one data point. Since these values only matter for the 
> beginning of the job, we should switch them to gauges so the value is 
> retained for later monitoring. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to