Repository: mesos
Updated Branches:
  refs/heads/master 0426155cf -> 6e1041704


Delete framework data in TaskStatus to avoid OOM

There was a bug found that Spark use TaskStatus.data to transfer computed
result and mesos-master RES memory keeps increasing fast and finally will be
killed by OOM killer.

Review: https://reviews.apache.org/r/25184


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/8538eed6
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/8538eed6
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/8538eed6

Branch: refs/heads/master
Commit: 8538eed683eea99a340ff5272205113db0580a25
Parents: 0426155
Author: Chengwei Yang <[email protected]>
Authored: Wed Oct 15 14:12:26 2014 -0500
Committer: Timothy St. Clair <[email protected]>
Committed: Wed Oct 15 14:12:26 2014 -0500

----------------------------------------------------------------------
 src/master/master.cpp | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/8538eed6/src/master/master.cpp
----------------------------------------------------------------------
diff --git a/src/master/master.cpp b/src/master/master.cpp
index 1b1ce0d..efb90d6 100644
--- a/src/master/master.cpp
+++ b/src/master/master.cpp
@@ -4479,12 +4479,21 @@ void Master::updateTask(Task* task, const TaskStatus& 
status)
     !protobuf::isTerminalState(task->state()) &&
     protobuf::isTerminalState(status.state());
 
-  // TODO(brenden) Consider wiping the `data` and `message` fields?
+  // TODO(brenden) Consider wiping the `message` field?
   if (task->statuses_size() > 0 &&
       task->statuses(task->statuses_size() - 1).state() == status.state()) {
     task->mutable_statuses()->RemoveLast();
   }
   task->add_statuses()->CopyFrom(status);
+
+  // Delete data (maybe very large since it's stored by on-top framework) we
+  // are not interested in to avoid OOM.
+  // For example: mesos-master is running on a machine with 4GB free memory,
+  // if every task stores 10MB data into TaskStatus, then mesos-master will be
+  // killed by OOM killer after have 400 tasks finished.
+  // MESOS-1746
+  task->mutable_statuses(task->statuses_size() - 1)->clear_data();
+
   task->set_state(status.state());
 
   stats.tasks[status.state()]++;

Reply via email to