Kevin Sweeney created AURORA-722:
------------------------------------

             Summary: snapshot performance issues
                 Key: AURORA-722
                 URL: https://issues.apache.org/jira/browse/AURORA-722
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Kevin Sweeney
            Assignee: Kevin Sweeney
             Fix For: 0.6.0


In one of our larger production clusters we're seeing issues with snapshot 
performance that cause the scheduler to failover before completing a snapshot.

For background, the scheduler writes a compressed (when -deflate_snapshots is 
enabled), binary-encoded Snapshot (from api.thrift) to the mesos replicated log 
every hour (or -dlog_snapshot_interval). This snapshot represents most of the 
scheduler's heap usage, including the configuration for all tasks running in 
the cluster.

Add appropriate instrumentation to the snapshot routine and patch any obvious 
performance bottlenecks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to