[
https://issues.apache.org/jira/browse/AURORA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David McLaughlin resolved AURORA-178.
-------------------------------------
Resolution: Fixed
This was shipped.
> Log/observe snapshot operations
> -------------------------------
>
> Key: AURORA-178
> URL: https://issues.apache.org/jira/browse/AURORA-178
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: Jonathan Boulle
> Priority: Minor
> Labels: newbie
>
> Currently, snapshot operations of excessive duration aren't necessarily
> obvious in e.g. the scheduler logs or dashboards. Since this is a potentially
> critical/dangerous operation (in some cases leading to zookeeper timeouts +
> scheduler suicide), it would be prudent to expose relevant information more
> readily (e.g. when the operations commence/complete, timing, etc)
> From Zameer:
> {quote}The doSnapshot method of LogStorage is timed with the key
> "scheduler_log_snapshot". These are the stats it produces:
> scheduler_log_snapshot_events 19
> scheduler_log_snapshot_events_per_sec 0.0
> scheduler_log_snapshot_nanos_per_event 0.0
> scheduler_log_snapshot_nanos_total 373115257383
> scheduler_log_snapshot_nanos_total_per_sec 0.0
> scheduler_log_snapshot_persist_events 19
> scheduler_log_snapshot_persist_events_per_sec 0.0
> scheduler_log_snapshot_persist_nanos_per_event 0.0
> scheduler_log_snapshot_persist_nanos_total 339151517713
> scheduler_log_snapshot_persist_nanos_total_per_sec 0.0
> scheduler_log_snapshots 19
> Which metric should be tracked in our dashboard?
> {quote}
> From Bill F:
> {quote}a very long snapshot might never be reflected there if a suicide
> happens mid-way through. The minimal fix would be to just LOG when a snapshot
> is about to commence.{quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)