Repository: mesos Updated Branches: refs/heads/master 0426155cf -> 6e1041704
Delete framework data in TaskStatus to avoid OOM There was a bug found that Spark use TaskStatus.data to transfer computed result and mesos-master RES memory keeps increasing fast and finally will be killed by OOM killer. Review: https://reviews.apache.org/r/25184 Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/8538eed6 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/8538eed6 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/8538eed6 Branch: refs/heads/master Commit: 8538eed683eea99a340ff5272205113db0580a25 Parents: 0426155 Author: Chengwei Yang <[email protected]> Authored: Wed Oct 15 14:12:26 2014 -0500 Committer: Timothy St. Clair <[email protected]> Committed: Wed Oct 15 14:12:26 2014 -0500 ---------------------------------------------------------------------- src/master/master.cpp | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/8538eed6/src/master/master.cpp ---------------------------------------------------------------------- diff --git a/src/master/master.cpp b/src/master/master.cpp index 1b1ce0d..efb90d6 100644 --- a/src/master/master.cpp +++ b/src/master/master.cpp @@ -4479,12 +4479,21 @@ void Master::updateTask(Task* task, const TaskStatus& status) !protobuf::isTerminalState(task->state()) && protobuf::isTerminalState(status.state()); - // TODO(brenden) Consider wiping the `data` and `message` fields? + // TODO(brenden) Consider wiping the `message` field? if (task->statuses_size() > 0 && task->statuses(task->statuses_size() - 1).state() == status.state()) { task->mutable_statuses()->RemoveLast(); } task->add_statuses()->CopyFrom(status); + + // Delete data (maybe very large since it's stored by on-top framework) we + // are not interested in to avoid OOM. + // For example: mesos-master is running on a machine with 4GB free memory, + // if every task stores 10MB data into TaskStatus, then mesos-master will be + // killed by OOM killer after have 400 tasks finished. + // MESOS-1746 + task->mutable_statuses(task->statuses_size() - 1)->clear_data(); + task->set_state(status.state()); stats.tasks[status.state()]++;
