[
https://issues.apache.org/jira/browse/MESOS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263273#comment-13263273
]
[email protected] commented on MESOS-110:
-----------------------------------------------------
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > I've only gotten halfway through ... but there is a bunch here already.
I'd like to break this up into at least four patches. (1) The utils stuff that
was added. (2) The master changes. (3) The slave::path namespace stuff. (3) The
status update manager API + implementation (but not the slave using it yet).
And (4) the slave using each of these components, and the executor changes that
are included.
bq. >
bq. > These comments are across all of those patches, but I'll make future
passes on each of those components.
addressed the comments for the utils.hpp part. Will send a review for utils.hpp
and protobuf_utils.hpp (forgot to include it in this review) shortly.
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 211
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line211>
bq. >
bq. > Add that this is at the current file position of the file descriptor.
done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 218
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line218>
bq. >
bq. > How is this helpful? (If this came from my code, it should be
removed.)
it did come from your code. killed it.
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 259
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line259>
bq. >
bq. > Blah.
fixed format.
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 253
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line253>
bq. >
bq. > I'd prefer if this did not seek to the beginning and read the file,
but rather read from the current position until the end (and have the comment
say as much).
line 263 was a bug. fixed it. this line should make sense now!
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 264
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line264>
bq. >
bq. > This looks like a bug ('offset' as the third argument?).
good catch! fixed.
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 267
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line267>
bq. >
bq. > No need for the space here though.
fixed
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 276
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line276>
bq. >
bq. > Again, should be killed (only makes sense in a macro).
killed
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 290
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line290>
bq. >
bq. > You should refactor the protobuf::read and protobuf::write to use
these versions of read and write now as well.
didnt get u?
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 329
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line329>
bq. >
bq. > man 3 dirname (and use it please).
aha..done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 532
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line532>
bq. >
bq. > s/file_pattern/pattern
done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 534
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line534>
bq. >
bq. > s/result/results
done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 536
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line536>
bq. >
bq. > Kill.
done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 538
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line538>
bq. >
bq. > Why not return a Try instead?
done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 546
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line546>
bq. >
bq. > Why is this a hack?
because we are using exists() function to check isDir() semantics, based on the
knowledge that the entry always exists.
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 549
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line549>
bq. >
bq. > s/p/result or s/p/path
done
bq. On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq. > src/common/utils.hpp, line 555
bq. > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line555>
bq. >
bq. > Kill.
done
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4462/#review7232
-----------------------------------------------------------
On 2012-04-19 16:53:07, Vinod Kone wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4462/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-19 16:53:07)
bq.
bq.
bq. Review request for mesos, Benjamin Hindman and John Sirois.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Sorry for the huge CL!
bq.
bq. Slave restarts now supports recovery!
bq. --> Non-disruptive restart means running tasks are not lost
bq. --> Re-connects with live executors
bq. --> Checkpoints and reliably sends status updates
bq. --> Ability to kill executors if the slave upgrade is incompatible with
running executors
bq.
bq.
bq. This addresses bug mesos-110.
bq. https://issues.apache.org/jira/browse/mesos-110
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/Makefile.am d5edaa2
bq. src/common/hashset.hpp 1feb610
bq. src/common/utils.hpp 1d81e21
bq. src/exec/exec.cpp e8db407
bq. src/launcher/launcher.cpp a141b9a
bq. src/local/local.hpp 55f9eaf
bq. src/local/local.cpp affe432
bq. src/master/master.cpp 4dc9ee0
bq. src/messages/messages.proto 87e1548
bq. src/sched/sched.cpp dcadb10
bq. src/scripts/killtree.sh bceae9d
bq. src/slave/constants.hpp f0c8679
bq. src/slave/http.cpp 19c48a0
bq. src/slave/isolation_module.hpp c896908
bq. src/slave/lxc_isolation_module.hpp b7beefe
bq. src/slave/lxc_isolation_module.cpp 66a2a89
bq. src/slave/main.cpp 85cba25
bq. src/slave/process_based_isolation_module.hpp f6f9554
bq. src/slave/process_based_isolation_module.cpp 2b37d42
bq. src/slave/slave.hpp 279bc7b
bq. src/slave/slave.cpp 3358ec4
bq. src/slave/statusupdates_manager.hpp PRE-CREATION
bq. src/slave/statusupdates_manager.cpp PRE-CREATION
bq. src/tests/external_tests.cpp d1b20e4
bq. src/tests/fault_tolerance_tests.cpp 6772daf
bq. src/tests/slave_restart_tests.cpp PRE-CREATION
bq. src/tests/utils.hpp e81ec82
bq.
bq. Diff: https://reviews.apache.org/r/4462/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. make check.
bq.
bq. Note that only the new test in tests/slave_restart_tests.cpp engages in
recovery!
bq.
bq. Recovery is disabled for old tests (though they still checkpoint relevant
info!)
bq.
bq.
bq. Thanks,
bq.
bq. Vinod
bq.
bq.
> Mesos deploys should not restart tasks
> --------------------------------------
>
> Key: MESOS-110
> URL: https://issues.apache.org/jira/browse/MESOS-110
> Project: Mesos
> Issue Type: Improvement
> Components: framework
> Reporter: Rob Benson
> Assignee: Vinod Kone
>
> Running a long-lived service on Mesos has a significant drawback right now in
> that Mesos build deploys restart your tasks. This could lead to nontrivial
> outages for services that have a high warm-up time. Basically everything
> would need a graceful restart mechanism that basically allows a
> shutdown/restart with a new version of the code.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira