Kevin Sweeney created AURORA-722:
------------------------------------
Summary: snapshot performance issues
Key: AURORA-722
URL: https://issues.apache.org/jira/browse/AURORA-722
Project: Aurora
Issue Type: Bug
Components: Scheduler
Reporter: Kevin Sweeney
Assignee: Kevin Sweeney
Fix For: 0.6.0
In one of our larger production clusters we're seeing issues with snapshot
performance that cause the scheduler to failover before completing a snapshot.
For background, the scheduler writes a compressed (when -deflate_snapshots is
enabled), binary-encoded Snapshot (from api.thrift) to the mesos replicated log
every hour (or -dlog_snapshot_interval). This snapshot represents most of the
scheduler's heap usage, including the configuration for all tasks running in
the cluster.
Add appropriate instrumentation to the snapshot routine and patch any obvious
performance bottlenecks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)