> On 2012-03-24 23:51:02, Matei Zaharia wrote:
> > I really like the changes overall, but are you sure that removing code from
> > the error callback is a good idea? If people want to respond to particular
> > errors in particular ways, relying on a string to identify the error is
> > flimsy, because we might change the wording in the string.
> >
> > I guess it also depends on the semantics of error(). If we only use it for
> > "fatal" errors, such as the framework giving a weird offer reply, where we
> > immediately drop that framework, then it's fine to just have text. But if
> > we eventually want to use it for non-fatal ones where we expect some
> > frameworks to be able to recover, we might want to give them a numeric
> > code. One other option might be to make the argument to error() a protobuf
> > (ErrorInfo).
I don't think we want to get into the business of requiring people to deal with
error conditions non-locally. This seems like a horrible idea, and very hard to
reason about. Consider:
void resourceOffers(...) {
...;
driver.launchTasks(...);
...;
}
...
void error(..., int code, ...) {
if (code somehow means the launchTask thing I did some time ago up in
resourceOffers) {
do something;
}
}
For this reason, I think error should only be for "fatal" errors. Note that
with framework failover it's perfectly fine for a framework that was aborted
due to an error to recreate another driver and try again.
- Benjamin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4474/#review6338
-----------------------------------------------------------
On 2012-03-24 23:31:16, Benjamin Hindman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4474/
> -----------------------------------------------------------
>
> (Updated 2012-03-24 23:31:16)
>
>
> Review request for mesos, Andy Konwinski, Charles Reiss, Matei Zaharia, John
> Sirois, and Vinod Kone.
>
>
> Summary
> -------
>
> Apologies for a rather large diff. I meant to do a few independent changes
> but they all got mangled into one because of subtle dependencies between them.
>
> * Changed MesosSchedulerDriver to take a FrameworkInfo instead of a
> name and possibly FrameworkID. This required adding an optional
> FrameworkID to FrameworkInfo. The hope is that with this change can
> more easily update the FrameworkInfo semantics without requiring
> recompilation. For example, as we add better support for auth
> entication we should hopefully be able to do that directly in
> FrameworkInfo. We also simplified the C++ MesosSchedulerDriver by
> removing unused constructors (that also don't have versions in
> either Java or Python).
>
> * Renamed the 'url' parameter of MesosSchedulerDriver to 'master' and
> updated the allowable schemes of 'master' to include a host:port
> pair (we internally add 'mas ter@') and replaced zoo:// with zk://
> and zoofile:// with file:// (but with different semantics, see the
> documentation). We also changed the master webui to show t he
> host:port and removed the PID and updated and added tests related to
> creating MasterDetectors from the above schemes.
>
> * Replaced the --url argument of mesos-master with --zk argument for
> clarity.
>
> * Removed 'code' from the Scheduler/Executor::error callback since it
> was not being used (nor did it seem useful in the future).
>
> * Made some naming consistency changes across the C++, Java, and
> Python interfaces (sched -> scheduler, exec -> executor, etc).
>
>
> Diffs
> -----
>
> hadoop/hadoop-0.20.205.0.patch 03f3949
> include/mesos/executor.hpp ef7bdf9
> include/mesos/mesos.proto 23aad17
> include/mesos/scheduler.hpp 5bcf065
> src/Makefile.am 090c07a
> src/deploy/mesos-env.sh c3dfda2
> src/detector/detector.hpp 39c19e4
> src/detector/detector.cpp 421f035
> src/detector/url_processor.hpp 3be0c09
> src/detector/url_processor.cpp 1a83b26
> src/examples/java/TestExceptionFramework.java d15b132
> src/examples/java/TestExecutor.java f57cda4
> src/examples/java/TestFramework.java f86646b
> src/examples/java/TestMultipleExecutorsFramework.java cdbcc48
> src/examples/long_lived_executor.cpp 4b4abad
> src/examples/long_lived_framework.cpp 2775f5a
> src/examples/no_executor_framework.cpp fb31b25
> src/examples/python/test_framework.py 0d71831
> src/examples/test_executor.cpp ba54004
> src/examples/test_framework.cpp 622f6ac
> src/exec/exec.cpp 060707ad
> src/java/jni/construct.cpp 90b7e9d
> src/java/jni/org_apache_mesos_Log.cpp 4c9a4e1
> src/java/jni/org_apache_mesos_MesosExecutorDriver.cpp 579ebbe
> src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 547d54b
> src/java/src/org/apache/mesos/Executor.java 3db3068
> src/java/src/org/apache/mesos/MesosExecutorDriver.java 6781dec
> src/java/src/org/apache/mesos/MesosSchedulerDriver.java 129a108
> src/java/src/org/apache/mesos/Scheduler.java b01468b
> src/launcher/executor.cpp da4203c
> src/master/main.cpp f31212b
> src/master/master.hpp 320d760
> src/master/master.cpp 6110c17
> src/mesos/main.cpp 9e012f8
> src/messages/messages.proto 7f9cffe
> src/python/native/mesos_scheduler_driver_impl.cpp 9e088ef
> src/python/native/proxy_executor.hpp 32df90b
> src/python/native/proxy_executor.cpp d453406
> src/python/native/proxy_scheduler.hpp d0e5dfd
> src/python/native/proxy_scheduler.cpp 9717f44
> src/python/src/mesos.py 25d762b
> src/sched/sched.cpp 9440381
> src/slave/main.cpp 6b4f37a
> src/slave/slave.cpp 28900aa
> src/tests/attributes_test.cpp 1e87d923
> src/tests/attributes_tests.cpp PRE-CREATION
> src/tests/exception_tests.cpp c14bacc
> src/tests/fault_tolerance_tests.cpp 9e7d181
> src/tests/master_detector_tests.cpp PRE-CREATION
> src/tests/master_tests.cpp 57db468
> src/tests/resource_offers_tests.cpp a405888
> src/tests/url_processor_tests.cpp 573524b
> src/tests/utils.hpp 9241567
> src/tests/zookeeper_url_tests.cpp PRE-CREATION
> src/webui/master/index.tpl bc0cd0b
> src/zookeeper/authentication.hpp 944d95e
> src/zookeeper/url.hpp PRE-CREATION
>
> Diff: https://reviews.apache.org/r/4474/diff
>
>
> Testing
> -------
>
> make check && make hadoop
>
>
> Thanks,
>
> Benjamin
>
>